ZFS Tuning for HPC

If you manage storage servers, chances are you are already aware of ZFS and some of the features and functions it boasts. In short, ZFS is a combined all-purpose filesystem and volume manager that simplifies data storage management while offering some advanced features, including drive pooling with software RAID support, file snapshots, in-line data compression, data deduplication, built-in data integrity, advanced caching (to DRAM and SSD), and more.

ZFS is licensed under the Common Development and Distribution License (CDDL), a weak copyleft license based on the Mozilla Public License (MPL). Although open source, ZFS and anything else under the CDDL was, and supposedly still is, incompatible with the GNU General Public License (GPL). This hasn’t stopped ZFS enthusiasts from porting it over to the Linux kernel, where it remains a side project under the dominion of the ZFS on Linux (ZoL) project.

The ZoL project not only helped introduce the advanced filesystem to Linux users, it garnered its fair share of users, some developers, and an entire community to support it. That aside, with a significant user base and the filesystem’s use for a wide variety of applications (HPC included), it often becomes necessary to know how to tune the filesystem and understand which knobs to turn.

You should understand that when you decide to apply the methods exercised in this article, you must do so with caution or after dry runs before rolling it out into production.

Creating the Test Environment

To begin, you need a server (or virtual machine) with one or more spare drives. I advise more than one because when it comes to performance, spreading I/O load across more disk drives instead of bottlenecking a single drive helps significantly. Therefore, I use four local drives – sdc, sdd, sde, and sdf – in this article:

$ cat /proc/partitions|grep sda
   8        0  488386584 sda
   8        1       1024 sda1
   8        2  488383488 sda2
   8       16   39078144 sdb
   8       32 6836191232 sdc
   8       64 6836191232 sde
   8       48 6836191232 sdd
   8       80 6836191232 sdf

Make sure to load the ZFS modules,

$ sudo modprobe zfs

and verify that they are loaded:

$ lsmod|grep zfs
zfs                  3039232  3
zunicode              331776  1 zfs
zavl                   16384  1 zfs
icp                   253952  1 zfs
zcommon                65536  1 zfs
znvpair                77824  2 zfs,zcommon
spl                   102400  4 zfs,icp,znvpair,zcommon

With the four drives identified above, I create a ZFS RAIDZ pool, which is equivalent to RAID5,

$ sudo zpool create -f myvol raidz sdc sdd sde sdf

and verify the status of the pool (Listing 1) and that it has been mounted (Listing 2).

Listing 1: Pool Status

$ zpool status
  pool: myvol
 state: ONLINE
  scan: none requested
config:
 
  NAME        STATE     READ WRITE CKSUM
  myvol       ONLINE       0     0     0
    raidz1-0  ONLINE       0     0     0
      sdc     ONLINE       0     0     0
      sdd     ONLINE       0     0     0
      sde     ONLINE       0     0     0
      sdf     ONLINE       0     0     0
 
errors: No known data errors

Listing 2: Pool Mounted

$ df -ht zfs
Filesystem      Size  Used Avail Use% Mounted on
myvol            18T  128K   18T   1% /myvol

Some Basic Tuning

A few general procedures can tune a ZFS filesystem for performance, such as disabling file access time updates in the file metadata. Historically, filesystems have always tracked when a user or application accesses a file and logs the most recent time of access, even if that file was only read and not modified. This activity can affect metadata performance when updating this field. To avoid this unnecessary I/O, simply turn off the atime  parameters:

$ sudo zfs set atime=off myvol

To verify that it has been turned off, use the zfs get atime command:

$ sudo zfs get atime myvol
NAME   PROPERTY  VALUE  SOURCE
myvol  atime     off    local

Another parameter that can affect performance is compression, and although some algorithms (e.g., LZ4) are known to perform extremely well, it still sucks up a bit of CPU time compared with its counterparts. Therefore, disable filesystem compression,

$ sudo zfs set compression=off myvol

and verify that compression has been turned off:

$ sudo zfs get compression myvol
NAME   PROPERTY     VALUE     SOURCE
myvol  compression  off       default

To view all available parameters, use zfs get all (Listing 3).

Listing 3: View Parameters

$ zfs get all myvol
NAME   PROPERTY              VALUE                  SOURCE
myvol  type                  filesystem             -
myvol  creation              Sat Feb 22 22:09 2020  -
myvol  used                  471K                   -
[ ... ]

Related content

  • Tuning ZFS for Speed on Linux
    The ZFS filesystem and volume manager simplifies data storage management and offers advanced features that allow it to perform in mission-critical or high-performance environments.
  • Introducing parity declustering RAID
    Declustered RAID decreases resilvering times, restoring a pool to full redundancy in a fraction of the time over the traditional RAIDz. We look at OpenZFS, the first freely distributed open source solution to offer a parity declustered RAID feature.
  • Building a virtual NVMe drive
    An economical and high-performing hybrid NVMe SSD is exported to host servers that use it as a locally attached NVMe device.
  • Creating Virtual SSDs

    An economical and high-performing hybrid NVMe SSD is exported to host servers that use it as a locally attached NVMe device.

  • ZFS on Linux helps if the ZFS FUSE service refuses to work
    The new version 10 of FreeBSD can cause Linux admins problems when attempting to reconstruct data from ZFS pools. The solution comes courtesy of the ZFS on Linux project.
comments powered by Disqus