Tuning ZFS for Speed on Linux

ZFS Tuning for HPC

The L2ARC

ZFS provides another, larger secondary layer for read caching. By having a larger volume to cache, you are increasing your chances of rereading valuable data content without hitting the slower device underneath. In ZFS, this is accomplished by adding an SSD to your pool. The Level 2 ARC (L2ARC) will host entries that are scanned from the "primary" ARC cache and are next to be evicted.

In my configuration, I have created two partitions on a local NVMe device:

$ cat /proc/partitions|grep nvme
 259        0  244198584 nvme0n1
 259        3   97654784 nvme0n1p1
 259        4   96679936 nvme0n1p2

I will be using partition 1 for the L2ARC read cache, so to enable, I enter:

$ sudo zpool add myvol cache nvme0n1p1

Then, I verify that the cache volume has been added to the pool configuration (Listing 5).

Listing 5

Verify Pool Config 1

$ sudo zpool status
  pool: myvol
 state: ONLINE
  scan: none requested
config:
  NAME         STATE     READ WRITE CKSUM
  myvol        ONLINE       0     0     0
    raidz1-0   ONLINE       0     0     0
      sdc      ONLINE       0     0     0
      sdd      ONLINE       0     0     0
      sde      ONLINE       0     0     0
      sdf      ONLINE       0     0     0
  cache
    nvme0n1p1  ONLINE       0     0     0
errors: No known data errors

Updates that enable a persistent L2ARC cache that can tolerate system reboots are soon to make the mainline ZFS code.

The ZFS ZIL SLOG

The purpose of the ZFS Intent Log (ZIL) is to persistently log synchronous I/O operations to disk before it is written to the pool managed array. That synchronous part is how you can ensure that all operations complete and are persisted to disk before returning an I/O completion status back to the application. You can think of it as a sort of "write cache." The separate intent log (SLOG), however, is intended to give this write log a bit of a boost by plugging in an SSD.

Remember how I had two separate partitions on the local NVMe device? The one partition was used for the L2ARC read cache; now, I will use the second partition for the SLOG write cache.

To add the NVMe device partition as the SLOG to the pool, enter:

$ sudo zpool add myvol log nvme0n1p2

Then, verify that the cache volume has been added to the pool configuration (Listing 6).

Listing 6

Verify Pool Config 2

$ sudo zpool status
  pool: myvol
 state: ONLINE
  scan: none requested
config:
  NAME         STATE     READ WRITE CKSUM
  myvol        ONLINE       0     0     0
    raidz1-0   ONLINE       0     0     0
      sdc      ONLINE       0     0     0
      sdd      ONLINE       0     0     0
      sde      ONLINE       0     0     0
      sdf      ONLINE       0     0     0
  logs
    nvme0n1p2  ONLINE       0     0     0
  cache
    nvme0n1p1  ONLINE       0     0     0
errors: No known data errors

Now that you have added the NVMe devices as the caches for both reads and writes, you can view general and basic metrics of those devices with the same zpool iostat interface (Listing 7).

Listing 7

View Metrics

$ zpool iostat -v myvol
               capacity     operations     bandwidth
pool         alloc   free   read  write   read  write
-----------  -----  -----  -----  -----  -----  -----
myvol        1.62M  25.2T      0      1     84  15.2K
  raidz1     1.62M  25.2T      0      1     67  13.5K
    sdc          -      -      0      0     16  3.40K
    sdd          -      -      0      0     16  3.39K
    sde          -      -      0      0     16  3.38K
    sdf          -      -      0      0     16  3.37K
logs             -      -      -      -      -      -
  nvme0n1p2      0    92G      0      0    586  56.6K
cache            -      -      -      -      -      -
  nvme0n1p1  16.5K  93.1G      0      0  1.97K    636
-----------  -----  -----  -----  -----  -----  -----

Conclusion

As you can see, ZFS is equipped with an entire arsenal of features that allow it to perform better in more mission critical or demanding high-performance environments. With an active community supporting ZFS, the filesystem is also very likely to continue to see additional features and improvements in the near future.

Infos

  1. ZFS on Linux: https://zfsonlinux.org/

The Author

Petros Koutoupis is currently a senior performance software engineer at Cray for its Lustre High Performance File System division. He is also the creator and maintainer of the RapidDisk Project (http://www.rapiddisk.org/). Petros has worked in the data storage industry for well over a decade and has helped to pioneer the many technologies unleashed in the wild today.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Tuning ZFS for Speed on Linux

    The ZFS filesystem and volume manager simplifies data storage management and offers advanced features that allow it to perform in mission-critical or high-performance environments.

  • Introducing parity declustering RAID
    Declustered RAID decreases resilvering times, restoring a pool to full redundancy in a fraction of the time over the traditional RAIDz. We look at OpenZFS, the first freely distributed open source solution to offer a parity declustered RAID feature.
  • Building a virtual NVMe drive
    An economical and high-performing hybrid NVMe SSD is exported to host servers that use it as a locally attached NVMe device.
  • Creating Virtual SSDs

    An economical and high-performing hybrid NVMe SSD is exported to host servers that use it as a locally attached NVMe device.

  • ZFS on Linux helps if the ZFS FUSE service refuses to work
    The new version 10 of FreeBSD can cause Linux admins problems when attempting to reconstruct data from ZFS pools. The solution comes courtesy of the ZFS on Linux project.
comments powered by Disqus