« Previous 1 2
Shared Storage with NFS and SSHFS
File Sharing
Testing SSHFS with Linux
Now that SSHFS is installed, you can test it. For this example, I will use SSHFS to mount a directory named CLUSTERBUFFER2
(Listing 1) from the desktop onto a test system as /home/laytonjb/DATA
. First, I need to create the directory on the test system:
[laytonjb@test8 ~]$ mkdir DATA [laytonjb@test8 ~]$ cd DATA [laytonjb@test8 DATA]$ ls -s total 0
Listing 1
CLUSTERBUFFER2
[laytonjb@home4 CLUSTERBUFFER2]$ ls -s total 3140 4 BLKTRACE/ 4 NFSIOSTAT/ 296 SE_strace.ppt 4 CHECKSUM/ 4 NPB/ 4 STRACE/ 32 Data_storage_Discussion.doc 4 OLD/ 4 STRACE_NEW_PY/ 4 FS_AGING/ 4 OTHER2/ 4 STRACENG/ 4 FS_Scan/ 4 PAK/ 4 STRACE_PY/ 324 fs_scan.tar.gz 164 pak.tar.gz 4 STRACE_PY2/ 4 IOSTAT/ 4 PERCEUS/ 4 STRACE_PY_FILE_POINTER/ 1300 LWESFO06.pdf 4 REAL_TIME/ 944 Using Tracing Tools.ppt 4 LYLE_LONG/ 4 REPLAYER/
An empty mount point (directory) is recommended, because when the remote filesystem is mounted, it will "cover up" any files in the directory (this is also true for NFS).
Mounting a filesystem using SSHFS can be done by any user with no admin intervention. The basic form of the SSHFS command is:
$ sshfs user@home:[dir] [local dir]
For this example, the following command was used:
[laytonjb@test8 ~]$ sshfs 192.168.1.4:/home/laytonjb/DATA /home/laytonjb/DATA laytonjb@192.168.1.4's password:
Notice that the full paths for both the remote directory and the local directory were used. Also notice that that the password on the desktop (the storage server) had to be used. This could be avoided with the use of passwordless SSH and SSH keys.
To make sure everything was mounted correctly, take a look at the content of the local directory on the test system (Listing 2).
Listing 2
Local Directory on Test System
[laytonjb@test8 ~]$ cd DATA [laytonjb@test8 DATA]$ ls -s total 3140 4 BLKTRACE 4 NFSIOSTAT 296 SE_strace.ppt 4 CHECKSUM 4 NPB 4 STRACE 32 Data_storage_Discussion.doc 4 OLD 4 STRACE_NEW_PY 4 FS_AGING 4 OTHER2 4 STRACENG 4 FS_Scan 4 PAK 4 STRACE_PY 324 fs_scan.tar.gz 164 pak.tar.gz 4 STRACE_PY2 4 IOSTAT 4 PERCEUS 4 STRACE_PY_FILE_POINTER 1300 LWESFO06.pdf 4 REAL_TIME 944 Using Tracing Tools.ppt 4 LYLE_LONG 4 REPLAYER
SSHFS can be used on both secured and unsecured networks and by users without administrator intervention. As long as the user has permissions for the remote directory, then SSHFS can be used. Most importantly, you should pay attention to the mapping of user IDs (UIDs) and group IDs (GIDs) between systems. If the permissions on the client system don't match the server system, some mapping may be required between the two systems. SSHFS has the ability to read a file that contains mapping information and then take care of the mapping between the systems if needed.
Both SSHFS and SSH have a number of tuning options that affect performance. In a previous exercise [5], multiple options were tested and compared with NFS on two systems. Using the defaults for SSHFS, SSH, and NFS, NFS had much better streaming and input/output operations per second (IOPS) performance than SSHFS. With caching, large reads, and the use of a fast compression algorithm named Arcfour [6], SSH performance matched that of NFS for both streaming and IOPS tests.
A second round of tests was performed by adjusting network and TCP tuning options. When coupled with the previous SSHFS and SSH parameters, SSHFS performance exceeded NFS for both streaming and IOPS tests. However, Arcfour is not recommenced for use anymore, so when the encryption algorithm was changed to the default AES 128 encryption [7], the resulting SSHFS performance could not meet that of NFS in streaming or IOPS tests.
To try to regain the performance, a "compress" SSHFS option was used, which brought the SSHFS performance close to NFS, but not quite. However, it does illustrate that you can get very close to NFS performance with some tuning options.
Summary
Shared filesystems are almost 100 percent mandatory for HPC. Although you can run applications without one, it is not pleasant. Shared filesystems allow a constant view of user data across all nodes in the cluster and can make admin tasks easier.
Two main options, particularly if you are just starting out, are NFS and SSHFS. NFS has been around a long time, and the failure modes are pretty well understood. It also comes with every distribution of Linux. NFSv4 is the latest and has lots of possibilities for clusters. It is also fairly simple to tune NFS for your workloads and your configuration.
SSHFS is a bit of a dark horse for clusters. However, it offers the possibility of a shared filesystem using SSH, which can help with security, because only port 22 needs to be open (which you need for MPI application communications, anyway). SSHFS also uses SFTP encryption from one node to another. Combined with encrypted devices, you can have an end-to-end encrypted shared data service.
SSHFS is very tunable, and you can attain performance very close, if not equal, to NFS. I have not tested SSHFS at any scale larger than 32 clients and one data server, but it was quite stable and worked very well. If you need to go to a larger node count, I recommend you test SSHFS first before making a commitment.
Infos
- NFS options: https://www.systutorials.com/docs/linux/man/5-nfs/
- "Optimizing Your NFS Filesystem" by Jeff Layton: http://www.admin-magazine.com/HPC/Articles/Useful-NFS-Options-for-Tuning-and-Management
- FUSE: https://github.com/libfuse/libfuse
- SSHFS: https://github.com/libfuse/sshfs
- "SSHFS – Installation and Performance" by Jeff Layton: http://www.admin-magazine.com/HPC/Articles/Sharing-Data-with-SSHFS
- Arcfour: https://en.wikipedia.org/wiki/RC4
- AES 128: https://en.wikipedia.org/wiki/Advanced_Encryption_Standard
« Previous 1 2
Buy this article as PDF
(incl. VAT)