« Previous 1 2 3 4 Next »
ZFS Tuning for HPC
ARC
Operating systems commonly rely on local (and volatile) memory (e.g., DRAM) to cache file data and has done so for decades, with the ultimate goal of not having to touch the back-end storage device. Waiting for a disk drive to read the requested data can be painfully slow, so operating systems – and, in turn, filesystems – attempt to cache data content in the hopes of not accessing the underlying device. ZFS implements its own non-least-recently-used (non-LRU)-based cache, referred to as the adaptive replacement cache (ARC). In a standard (LRU) cache, the least recently used page cache data is replaced with new cache data. ZFS implements algorithms to be a bit more intelligent than this by maintaining lists for:
- recently cached entries,
- recently cached entries that have been accessed more than once,
- entries evicted from the list of (1) recently cached entries, and
- entries evicted from the list of (2) recently cached entries that have been accessed more than once.
Caching reads is an extremely difficult task to accomplish. Predicting which data will need to continue to remain in cache is not possible, and the likelihood of data being evicted before it is needed again, and then reread back into cache, is very high because of the nature of randomized read I/O profiles and operations.
The amount of memory the ARC can use on your local system can be managed in multiple ways. For instance, if you want to cap it at 4GB, you can insert that into the ZFS module with the zfs_arc_max parameter:
$ sudo modprobe zfs zfs_arc_max=4294967296
Or, you can create a configuration file for modprobe called /etc/modprobe.d/zfs.conf and save the following content in it:
options zfs zfs_arc_max=4294967296
You can verify the current setting of this parameter by viewing it under sysfs:
$ cat /sys/module/zfs/parameters/zfs_arc_max 0
Also, you can modify that same parameter over the same sysfs interface:
$ echo 4294967296 |sudo tee -a /sys/module/zfs/parameters/zfs_arc_max 4294967296 $ cat /sys/module/zfs/parameters/zfs_arc_max 4294967296
If you are ever interested in viewing the statistics of the ARC, it is all available in procfs (Listing 4).
Listing 4: ARC Statistics
$ cat /proc/spl/kstat/zfs/arcstats 13 1 0x01 96 26112 26975127196 517243166877 name type data hits 4 691 misses 4 254 demand_data_hits 4 0 demand_data_misses 4 0 demand_metadata_hits 4 691 demand_metadata_misses 4 254 prefetch_data_hits 4 0 prefetch_data_misses 4 0 prefetch_metadata_hits 4 0 prefetch_metadata_misses 4 0 mru_hits 4 88 mru_ghost_hits 4 0 mfu_hits 4 603 mfu_ghost_hits 4 0 deleted 4 0 mutex_miss 4 0 access_skip 4 0 evict_skip 4 0 [ ... ]
The L2ARC
ZFS provides another, larger secondary layer for read caching. By having a larger volume to cache, you are increasing your chances of rereading valuable data content without hitting the slower device underneath. In ZFS, this is accomplished by adding an SSD to your pool. The Level 2 ARC (L2ARC) will host entries that are scanned from the “primary” ARC cache and are next to be evicted.
In my configuration, I have created two partitions on a local NVMe device:
$ cat /proc/partitions|grep nvme 259 0 244198584 nvme0n1 259 3 97654784 nvme0n1p1 259 4 96679936 nvme0n1p2
I will be using partition 1 for the L2ARC read cache, so to enable, I enter:
$ sudo zpool add myvol cache nvme0n1p1
Then, I verify that the cache volume has been added to the pool configuration (Listing 5).
Listing 5: Verify Pool Config 1
$ sudo zpool status pool: myvol state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM myvol ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 sde ONLINE 0 0 0 sdf ONLINE 0 0 0 cache nvme0n1p1 ONLINE 0 0 0 errors: No known data errors
Updates that enable a persistent L2ARC cache that can tolerate system reboots are soon to make the mainline ZFS code.
« Previous 1 2 3 4 Next »