Useful NFS options for tuning and management
Tune-Up
NFS is the most widely used HPC filesystem. It is very easy to set up and performs reasonably well as primary storage for small to medium clusters. You can even use it for larger clusters if your applications do not use it for I/O (e.g., /home
). NFS is simple, easy to understand, and has known failure modes.
One of the most common questions about NFS configuration is how to tune it for performance and management and what options are typically used. Tuning for performance is a loaded question because performance is defined by so many variables, the most important of which is how to measure performance. However, you can at least identify options for improving NFS performance. In this article, I'll go over some of the major options illustrating some of the pluses and minuses.
In addition to tuning for performance, I will present a few useful options for managing and securing NFS. It's not an extensive list by any stretch of the imagination, but the options are typical for NFS. NFS tuning can occur on both servers and clients. You can also tune both the NFS client and server TCP stacks. In this article, I've broken the list of tuning options into three groups: (1) NFS performance tuning options, (2) system tuning options, and (3) NFS management/policy options (Table 1). In the sections that follow, these options are presented and discussed.
Table 1
Tuning Options
NFS Performance Tuning Options |
---|
Synchronous vs. asynchronous |
Number of NFS daemons (nfsd )
|
Block size setting |
Timeout and retransmission |
FS-Cache |
Filesystem-independent mount options |
System Tuning Options |
System memory |
MTU |
TCP tuning on the server |
NFS Management/Policy Options |
Subtree checking |
Root squashing |
Synchronous vs. Asynchronous
Most people use the synchronous option on the NFS server. For synchronous writes, the server replies to NFS clients only when the data has been written to stable storage. Many people prefer this option because they have little chance of losing data if the NFS server goes down or network connectivity is lost.
Asynchronous mode allows the server to reply to the NFS client as soon as it has processed the I/O request and sent it to the local filesystem; that is, it does not wait for the data to be written to stable storage before responding to the NFS client. This can save time for I/O requests and improve performance. However, if the NFS server crashes before the I/O request gets to disk, you could lose data.
Synchronous or asynchronous mode can be set when the filesystem is mounted on the clients by simply putting sync
or async
on the mount
command line or in the /etc/fstab
file for the NFS filesystem. If you want to change the option, you first have to unmount the NFS filesystem, change the option, then remount the filesystem.
The choice between the two modes of operation is up to you. If you have a copy of the data somewhere, you can perhaps run asynchronously for better performance. If you don't have copies or the data cannot be easily or quickly reproduced, then perhaps synchronous mode is the better option. No one can make this determination but you.
Number of NFS Daemons
NFS uses threads on the server to handle incoming and outgoing I/O requests. These show up in the process table as nfsd
(NFS daemons). Using threads helps NFS scale to handle large numbers of clients and large numbers of I/O requests. By default, NFS only starts with eight nfsd
processes (eight threads), which, given that CPUs today have very large core counts, is not really enough.
You can find the number of NFS daemons in two ways. The first is to look at the process table and count the number of NFS processes with
ps -aux | grep nfs
The second way is to look at the NFS config file (e.g., /etc/sysconfig/nfs
) for an entry that says RPCNFSDCOUNT
, which tells you the number of NFS daemons for the server.
If the NFS server has a large number of cores and a fair amount of memory, you can increase RPCNFSDCOUNT
. I have seen 256
used on an NFS server with 16 cores and 128GB of memory, and it ran extremely well. Even for home clusters, eight NFS daemons is very small, and you might want to consider increasing the number. (I have 8GB on my NFS server with four cores, and I run with 64 NFS daemons.)
You should also increase RPCNFSDCOUNT
when you have a large number of NFS clients performing I/O at the same time. For this situation, you should also increase the amount of memory on the NFS server to a large-ish number, such as 128 or 256GB. Don't forget that if you change the value of RPCNFSDCOUNT
, you will have to restart NFS for the change to take effect.
One way to determine whether more NFS threads help performance is to check the data in /proc/net/rpc/nfs
for the load on the NFS daemons. The output line that starts with th
lists the number of threads, and the last 10 numbers are a histogram of the number of seconds the first 10 percent of threads were busy, the second 10 percent, and so on.
Ideally, you want the last two numbers to be zero or close to zero, indicating that the threads are busy and you are not "wasting" any threads. If the last two numbers are fairly high, you should add NFS daemons, because the NFS server has become the bottleneck. If the last two, three, or four numbers are zero, then some threads are probably not being used. Personally, I don't mind this situation if I have enough memory in the system, because I might have reached a point at which I need more NFS daemons.
Block Size Setting
Two NFS client options specify the size of data chunks for writing (wsize
) and reading (rsize). If you don't specify the chunk sizes, the defaults are determined by the versions of NFS and the kernel being used. If you have NFS already running and configured, the best way to check the current chunk size is to run the command
cat /proc/mounts
on the NFS client and look for the wsize
and rsize values.
Chunk size affects how many packets or remote procedure calls (RPCs) are created and sent. For example, if you want to transmit 1MB of data using 32KB chunks, the data is sent in quite a few chunks and a correspondingly large number of network packets. If you increase the chunk size to 64KB, then the number of chunks and the number of packets on the network are reduced.
The chunk size that works best for your NFS configuration depends primarily on the network configuration and the applications performing I/O. If your applications are doing lots of small I/O, then a large chunk size would not make sense. The opposite is true as well.
If the applications are using a mixture of I/O payload sizes, then selecting a block size might require some experimentation. You can experiment by changing the options on the clients; you have to unmount and remount the filesystem before rerunning the applications. Don't be afraid to test a range of block sizes, starting with the default and moving into the megabyte range.
Buy this article as PDF
(incl. VAT)