Working with the Lustre Filesystem
The Lustre open source distributed, parallel filesystem scales to high-performance computing environments.
What do you do when you need to deploy a large filesystem that is scalable to the exabyte level and supports a large-client, simultaneous-access workload? You find a parallel distributed filesystem such as Lustre. In this article, I build the high-performance Lustre filesystem from source, install it on multiple machines, mount it from clients, and access them in parallel.
Lustre Filesystems
A distributed filesystem allows access to files from multiple hosts sharing the files within a computer network, which makes it possible for multiple users on multiple client machines to share files and storage resources. The client machines do not have direct access to the underlying block storage sharing those files; instead, they communicate with a set or cluster of server machines hosting those files and the filesystem to which they are written.
Lustre (or Linux Cluster) [1]-[3] is one such distributed filesystem, usually deployed for large-scale cluster high performance computing (HPC). Licensed under the GNU General Public License (GPL), Lustre provides a solution in which high performance and scalability to tens of thousands of nodes (including the clients) and exabytes of storage becomes a reality and is relatively simple to deploy and configure. As of this writing, the Lustre project is at version 2.14, nearing the official release of 2.15 (currently under development), which will be the next long-term support (LTS) release.
Lustre contains somewhat of a unique architecture, with four major functional units: (1) a single Management Service (MGS), which can be hosted on its own machine or on one of the metadata machines; (2) the Metadata Service (MDS), which contains Metadata Targets (MDTs); (3) Object Storage Services (OSS), which store file data on one or more Object Storage Targets (OSTs); and (4) the clients that access and use the file data.
For each Lustre filesystem, MDTs store namespace metadata, which include file names, directories, access permissions, and file layouts. The MDT data is stored in a single-disk dedicated filesystem that maps locally to the serving node, controls file access, and informs the client nodes which objects make up a file. One or more MDS nodes can exist on a single Lustre filesystem with one or more MDTs each.
An OST is a dedicated object-base filesystem exported for read and write operations. The capacity of a Lustre filesystem is determined by the sum of the total capacities of the OSTs.
Lustre presents all clients with a unified namespace for all of the files and data in the filesystem, which allows concurrent and coherent read and write access to the files in the filesystem. When a client accesses a file, it completes a file name lookup on the MDS, and either a new file is created or the layout of an existing file is returned to the client.
Locking the file on the OST, the client then runs one or more read or write operations to the file but does not directly modify the objects on the OST. Instead, it delegates tasks to the OSS. This approach ensures scalability and improved security and reliability, because it does not allow direct access to the underlying storage, which would increase the risk of filesystem corruption from misbehaving or defective clients.
Although all four components (MGS, MDT, OST, and client) can run on the same node, they are typically configured on separate nodes communicating over a network.
Prerequisites
In this article, I use eight nodes, four of which will be configured as client machines and the rest as the servers hosting the Lustre filesystem. Although not required, all eight systems will run CentOS 8.5.2111. As the names imply, the servers will host the target Lustre filesystem; the clients will not only mount it, but also write to it.
For the configuration, you need to build the filesystem packages for both the clients and the servers, which means you will need to install package dependencies from the package repositories:
$ sudo dnf install wget git make gcc kernel-devel \ epel-release automake binutils libtool bison byacc \ kernel-headers elfutils-libelf-devel elfutils-libelf \ kernel-rpm-macros kernel-abi-whitelists keyutils-libs \ keyutils-libs-devel libnl3 libnl3-devel rpm-build \ libselinux-devel
Next, enable the powertools repository and install the following packages:
$ sudo dnf config-manager --set-enabled powertools $ sudo dnf install dkms libyaml-devel
To build Lustre from source, you need to grab the updated e2fsprogs packages for your respective distribution and version hosted on the WhamCloud project website. In this case, I downloaded and installed the necessary packages for my system:
e2fsprogs-1.46.2.wc4-0.el8.x86_64.rpm e2fsprogs-devel-1.46.2.wc4-0.el8.x86_64.rpm e2fsprogs-libs-1.46.2.wc4-0.el8.x86_64.rpm libcom_err-1.46.2.wc4-0.el8.x86_64.rpm libcom_err-devel-1.46.2.wc4-0.el8.x86_64.rpm libss-1.46.2.wc4-0.el8.x86_64.rpm libss-devel-1.46.2.wc4-0.el8.x86_64.rpm
An RPM build environment needs to be created next, which will only be used once to grab, install, and extract the source kernel packages:
$ mkdir -p ~/rpmbuild/{BUILD,BUILDROOT,RPMS,SOURCES,SPECS,SRPMS} $ echo '%_topdir %(echo $HOME)/rpmbuild' > ~/.rpmmacros
The Lustre filesystem relies on a local filesystem to store local objects. The project supports ZFS and a patched version of ext4 called LDISKFS, which I use for the build with the ext4 source from a running kernel. To grab the correct kernel source, you need to make a note of your distribution and its version,
$ cat /etc/redhat-release CentOS Linux release 8.5.2111
as well as the kernel version:
$ uname -r 4.18.0-348.7.1.el8_5.x86_64
This location differs depending on the information output above. Listing 1 shows the commands for my setup to grab your kernel's source, install the source RPM, change into the directory containing the source objects, and extract the kernel tarball. The final three lines change to the kernel/fs source directory (which should mostly be empty) of the currently installed kernel source, rename the existing ext4 directory, and copy the extracted ext4 source in the current directory.
Listing 1: Kernel Source
$ wget https://vault.centos.org/8.5.2111/BaseOS/Source/SPackages/kernel-4.18.0-348.7.1.el8_5.src.rpm $ sudo rpm -ivh kernel-4.18.0-348.7.1.el8_5.src.rpm $ cd ~/rpmbuild/SOURCES $ tar xJf linux-4.18.0-348.7.1.el8_5.tar.xz $ cd /usr/src/kernels/4.18.0-305.10.2.el8_4.x86_64/fs/ $ sudo mv ext4/ ext4.orig $ sudo cp -r /home/pkoutoupis/rpmbuild/SOURCES/linux-4.18.0-305.10.2.el8_4/fs/ext4
Building Lustre from Source
The next steps check out the Lustre source code in your home directory, change into the source directory, check out the desired branch, and set the version string:
$ cd ~ $ git clone git://git.whamcloud.com/fs/lustre-release.git $ cd lustre-release $ git branch $ git checkout master $ ./LUSTRE-VERSION-GEN
To build the client packages, type:
$ sh autogen.sh && ./configure --disable-server && make rpms
When the build completes without error, the RPMs shown in Listing 2 will be listed in the root of the source directory.
Listing 2: RPMs After the Build
$ ls *.rpm kmod-lustre-client-2.14.56_111_gf8747a8-1.el8.x86_64.rpm kmod-lustre-client-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm kmod-lustre-client-tests-2.14.56_111_gf8747a8-1.el8.x86_64.rpm kmod-lustre-client-tests-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-2.14.56_111_gf8747a8-1.src.rpm lustre-client-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-client-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-client-debugsource-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-client-devel-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-client-tests-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-client-tests-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-iokit-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
Now you need to install the client packages on the client nodes and verify that the packages and the version have been installed:
$ sudo dnf install {kmod-,}lustre-client-2.14.56_111_gf8747a8-1.el8.x86_64.rpm $ rpm -qa|grep lustre kmod-lustre-client-2.14.56_111_gf8747a8-1.el8.x86_64 lustre-client-2.14.56_111_gf8747a8-1.el8.x86_64
To build the server packages, type:
$ sh autogen.sh && ./configure && make rpms
When the build completes, you will find the RPMs shown in Listing 3 in the root of the source directory:
Listing 3: Source Root RPMs
[centos@ip-172-31-54-176 lustre-release]$ ls *.rpm kmod-lustre-2.14.56_111_gf8747a8-1.el8.x86_64.rpm kmod-lustre-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm kmod-lustre-osd-ldiskfs-2.14.56_111_gf8747a8-1.el8.x86_64.rpm kmod-lustre-osd-ldiskfs-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm kmod-lustre-tests-2.14.56_111_gf8747a8-1.el8.x86_64.rpm kmod-lustre-tests-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-2.14.56_111_gf8747a8-1.src.rpm lustre-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-debugsource-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-devel-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-iokit-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-osd-ldiskfs-mount-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-osd-ldiskfs-mount-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-resource-agents-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-tests-2.14.56_111_gf8747a8-1.el8.x86_64.rpm lustre-tests-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
To install the packages on the nodes designated as servers, type:
$ sudo dnf install *.rpm
Then, verify that the packages and the version have been installed. I installed the packages shown in Listing 4. Before proceeding, please read the “Configuring the Servers” box.
Listing 4: Packages on Nodes
[centos@ip-172-31-54-176 RPMS]$ rpm -qa|grep lustre lustre-osd-ldiskfs-mount-2.14.56_111_gf8747a8-1.el8.x86_64 lustre-osd-ldiskfs-mount-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64 kmod-lustre-osd-ldiskfs-2.14.56_111_gf8747a8-1.el8.x86_64 lustre-2.14.56_111_gf8747a8-1.el8.x86_64 lustre-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64 kmod-lustre-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64 kmod-lustre-osd-ldiskfs-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64 kmod-lustre-2.14.56_111_gf8747a8-1.el8.x86_64 lustre-iokit-2.14.56_111_gf8747a8-1.el8.x86_64 lustre-debugsource-2.14.56_111_gf8747a8-1.el8.x86_64
Configuring the Servers
For the sole purpose of convenience, I have deployed virtual machines to host this entire tutorial. I will also be limited to a 1 Gigabit Ethernet (GigE) network. On each of the virtual machines designated to host the Lustre filesystem, a secondary, approximately 50GB drive is attached.
Preparing The Metadata Servers
You now have Lustre builds for both the client and server setups. I will now switch the focus to use those builds to configure both. Although a separate node could have been used to host the management service (i.e., the MGS), I instead opted to use the first MDS hosting the first MDT as the management service. To do this, add the --mgs option when formatting the device for Lustre. A Lustre deployment can host one, 64, or more MDT devices. However, in this example, I will format just one (Listing 5). If you do choose to format additional MDTs, be sure to increment the value of the index parameter by one each time and specify the node ID (NID) for the MGS node with --mgsnode=<NID> (shown in the “Preparing The Object Storage Servers” section).
Listing 5: Formatting the MDT
$ sudo mkfs.lustre --fsname=testfs --index=0 --mgs --mdt /dev/sdb Permanent disk data: Target: testfs:MDT0000 Index: 0 Lustre FS: testfs Mount type: ldiskfs Flags: 0x65 (MDT MGS first_time update ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: checking for existing Lustre data: not found device size = 48128MB formatting backing filesystem ldiskfs on /dev/sdb target name testfs:MDT0000 kilobytes 49283072 options -I 512 -i 1024 -J size=1925 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,project,huge_file,ea_inode,large_dir,flex_bg -E lazy_journal_init="0",lazy_itable_init="0" -F mkfs_cmd = mke2fs -j -b 4096 -L testfs:MDT0000 -I 512 -i 1024 -J size=1925 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,project,huge_file,ea_inode,large_dir,flex_bg -E lazy_journal_init="0",lazy_itable_init="0" -F /dev/sdb 49283072k Writing CONFIGS/mountdata
Now create a mountpoint to host the MDT and then mount it:
$ sudo mkdir /mnt/mdt $ sudo mount -t lustre /dev/sdb /mnt/mdt/
Because I am not using LDAP and just trusting my clients (and its users) for this example, I need to execute the following on the same MGS node:
$ lctl set_param mdt.*.identity_upcall=NONE
Note that the above command should NOT be deployed in production because it could potentially lead to security concerns and issues.
Make note of the management server's IP address (Listing 6). This output will be the Lustre Networking (LNET) NID, which can be verified by:
$ sudo lctl list_nids10.0.0.2@tcp
Listing 6: Management Server
$ sudo ifconfig eth0 eth0: flags=4163mtu 1460 inet 10.0.0.2 netmask 255.255.255.255 broadcast 0.0.0.0 inet6 fe80::bfd3:1a4b:f76b:872a prefixlen 64 scopeid 0x20 ether 42:01:0a:80:00:02 txqueuelen 1000 (Ethernet) RX packets 11919 bytes 61663030 (58.8 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 10455 bytes 973590 (950.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
LNET is Lustre's network communication protocol, which is designed to be lightweight and efficient. It supports message passing for remote procedure call (RPC) request processes and remote direct memory access (RDMA) for bulk data movement. All metadata and file data I/O are managed through LNET.
Preparing the Object Storage Servers
On the next server, I format the secondary storage volume to be the first OST with an index of 0, while pointing to the MGS node with --mgsnode=10.0.0.2@tcp0 (Listing 7). Then, I create a mountpoint to host the OST and mount it:
$ sudo mkdir /mnt/ost $ sudo mount -t lustre /dev/sdb /mnt/ost/
Listing 7: Format the OST
$ sudo mkfs.lustre --reformat --index=0 --fsname=testfs --ost --mgsnode=10.0.0.2@tcp0 /dev/sdb Permanent disk data: Target: testfs:OST0000 Index: 0 Lustre FS: testfs Mount type: ldiskfs Flags: 0x62 (OST first_time update ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.0.0.2@tcp device size = 48128MB formatting backing filesystem ldiskfs on /dev/sdb target name testfs:OST0000 kilobytes 49283072 options -I 512 -i 1024 -J size=1024 -q -O extents,uninit_bg,dir_nlink,quota,project,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F mkfs_cmd = mke2fs -j -b 4096 -L testfs:OST0000 -I 512 -i 1024 -J size=1024 -q -O extents,uninit_bg,dir_nlink,quota,project,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F /dev/sdb 49283072k Writing CONFIGS/mountdata
On the rest of the nodes I follow the same procedure, again, by incrementing the index parameter value by one each time (Listing 8). Be sure to create the local mountpoint to host the OST and then mount it.
Listing 8: The Rest of the Nodes
$ sudo mkfs.lustre --reformat --index=1 --fsname=testfs --ost --mgsnode=10.0.0.2@tcp0 /dev/sdb Permanent disk data: Target: testfs:OST0001 Index: 1 Lustre FS: testfs Mount type: ldiskfs Flags: 0x62 [ ... ]
Using the Clients
To mount the filesystem on a client, you need to specify the filesystem type, the NID of the MGS, the filesystem’s name, and the mountpoint on which to mount it. The template for the command, and the command I used are:
mount -t lustre:/ mount -t lustre 10.0.0.2@tcp0:/testfs /lustre
In the examples below, I will be relying on pdsh to run commands on multiple remote hosts simultaneously. All four clients will need a local directory to mount the remote filesystem,
,
$ sudo pdsh -w 10.0.0.[3-6] mkdir -pv /lustre
after which, you can mount the remote filesystem on all clients:
$ sudo pdsh -w 10.0.0.[3-6] mount -t lustre 10.0.0.2@tcp0:/testfs /lustre
Each client now has access to the remote Lustre filesystem. The filesystem is currently empty:
$ sudo ls /lustre/ $
As a quick test, create an empty file and verify that it has been created:
$ sudo touch /lustre/test.txt $ sudo ls /lustre/ test.txt
All four clients should be able to see the same file:
$ sudo pdsh -w 10.0.0.[3-6] ls /lustre 10.0.0.3: test.txt 10.0.0.5: test.txt 10.0.0.6: test.txt 10.0.0.4: test.txt
You can clean up the output so that you do not see the same instance repeated over and over again:
$ sudo pdsh -w 10.0.0.[3-6] ls /lustre | dshbak -c ---------------- 10.0.0.[3-6] ---------------- test.txt
I/O and Performance Benchmarking
MDTest is an MPI-based metadata performance testing application designed to test parallel filesystems, and IOR is a benchmarking utility also designed to test the performance of distributed filesystems. To put it more simply: With MDTest, you would typically test the metadata operations involved in creating, removing, and reading objects such as directories, files, and so on, whereas IOR is more straightforward and just focuses on benchmarking buffered or direct sequential or random write-read throughput to the filesystem. Both are maintained and distributed together under the IOR GitHub project. To build the latest IOR package from source, you need to install a Message Passing Interface (MPI) framework, then clone, build, and install the test utilities:
$ sudo dnf install mpich mpich-devel $ git clone https://github.com/hpc/ior.git $ cd ior $ MPICC=/usr/lib64/mpich/bin/mpicc ./configure $ cd src/ $ sudo make && make install
You are now ready to run a simple benchmark of your filesystem.
IOR
The benchmark will give you a general idea of how it performs in its current environment. I rely on mpirun to dispatch the I/O generated by IOR in parallel across the clients; in the end, I get an aggregated result of the entire job execution.
The filesystem is currently empty, with the exception of the file created earlier to test the filesystem. Both the MDT and OSTs are empty with no real file data (Listing 9, executed from the client).
Listing 9: Current Environment
$ sudo lfs df UUID 1K-blocks Used Available Use% Mounted on testfs-MDT0000_UUID 22419556 10784 19944620 1% /lustre[MDT:0] testfs-OST0000_UUID 23335208 1764 20852908 1% /lustre[OST:0] testfs-OST0001_UUID 23335208 1768 20852904 1% /lustre[OST:1] testfs-OST0002_UUID 23335208 1768 20852904 1% /lustre[OST:2] filesystem_summary: 70005624 5300 62558716 1% /lustre
Now, run a write-only instance of IOR from the four clients simultaneously to benchmark the performance of the HPC setup. Each client will initiate a single process to write 64MB transfers to a 5GB file (Listing 10).
Listing 10: IOR Write-Only
$ sudo /usr/lib64/mpich/bin/mpirun --host 10.0.0.3,10.0.0.4,10.0.0.5,10.0.0.6 /usr/local/bin/ior -F -w -t 64m -k --posix.odirect -D 60 -u -b 5g -o /lustre/test.01 IOR-3.4.0+dev: MPI Coordinated Test of Parallel I/O Began : Tue Jan 25 20:02:21 2022 Command line : /usr/local/bin/ior -F -w -t 64m -k --posix.odirect -D 60 -u -b 5g -o /lustre/test.01 Machine : Linux lustre-client1 TestID : 0 StartTime : Tue Jan 25 20:02:21 2022 Path : /lustre/0/test.01.00000000 FS : 66.8 GiB Used FS: 35.9% Inodes: 47.0 Mi Used Inodes: 0.0% Options: api : POSIX apiVersion : test filename : /lustre/test.01 access : file-per-process type : independent segments : 1 ordering in a file : sequential ordering inter file : no tasks offsets nodes : 4 tasks : 4 clients per node : 1 repetitions : 1 xfersize : 64 MiB blocksize : 5 GiB aggregate filesize : 20 GiB stonewallingTime : 60 stoneWallingWearOut : 0 Results: access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ---- write 1835.22 28.68 0.120209 5242880 65536 0.000934 11.16 2.50 11.16 0 Summary of all tests: Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Max(OPs) Min(OPs) Mean(OPs) StdDev Mean(s) Stonewall(s) Stonewall(MiB) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggs(MiB) API RefNum write 1835.22 1835.22 1835.22 0.00 28.68 28.68 28.68 0.00 11.15941 NA NA 0 4 1 1 1 0 1 0 0 1 5368709120 67108864 20480.0 POSIX 0 Finished : Tue Jan 25 20:02:32 2022
Notice a little more than 1.8GiBps throughput writes to the filesystem. Considering that each client is writing to the target filesystem in a single process and that you probably did not hit the limit of the GigE backend, this isn't a bad result. You will start to see the OST targets fill up with data (Listing 11).
Listing 11: Writing to OST Targets
$ lfs df UUID 1K-blocks Used Available Use% Mounted on testfs-MDT0000_UUID 22419556 10800 19944604 1% /lustre[MDT:0] testfs-OST0000_UUID 23335208 5244648 15577064 26% /lustre[OST:0] testfs-OST0001_UUID 23335208 5244652 15577060 26% /lustre[OST:1] testfs-OST0002_UUID 23335208 10487544 10301208 51% /lustre[OST:2] filesystem_summary: 70005624 20976844 41455332 34% /lustre
This time, rerun IOR, but in read-only mode. The command will use the same number of clients, threads, and transfer size, but read 1GB (Listing 12).
Listing 12: IOR Read-Only
$ sudo /usr/lib64/mpich/bin/mpirun --host 10.0.0.3,10.0.0.4,10.0.0.5,10.0.0.6 /usr/local/bin/ior -F -r -t 64m -k --posix.odirect -D 15 -u -b 1g -o /lustre/test.01 IOR-3.4.0+dev: MPI Coordinated Test of Parallel I/O Began : Tue Jan 25 20:04:11 2022 Command line : /usr/local/bin/ior -F -r -t 64m -k --posix.odirect -D 15 -u -b 1g -o /lustre/test.01 Machine : Linux lustre-client1 TestID : 0 StartTime : Tue Jan 25 20:04:11 2022 Path : /lustre/0/test.01.00000000 FS : 66.8 GiB Used FS: 30.0% Inodes: 47.0 Mi Used Inodes: 0.0% Options: api : POSIX apiVersion : test filename : /lustre/test.01 access : file-per-process type : independent segments : 1 ordering in a file : sequential ordering inter file : no tasks offsets nodes : 4 tasks : 4 clients per node : 1 repetitions : 1 xfersize : 64 MiB blocksize : 1 GiB aggregate filesize : 4 GiB stonewallingTime : 15 stoneWallingWearOut : 0 Results: access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ---- WARNING: Expected aggregate file size = 4294967296 WARNING: Stat() of aggregate file size = 21474836480 WARNING: Using actual aggregate bytes moved = 4294967296 WARNING: Maybe caused by deadlineForStonewalling read 2199.66 34.40 0.108532 1048576 65536 0.002245 1.86 0.278201 1.86 0 Summary of all tests: Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Max(OPs) Min(OPs) Mean(OPs) StdDev Mean(s) Stonewall(s) Stonewall(MiB) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggs(MiB) API RefNum read 2199.66 2199.66 2199.66 0.00 34.37 34.37 34.37 0.00 1.86211 NA NA 0 4 1 1 1 0 1 0 0 1 1073741824 67108864 4096.0 POSIX 0 Finished : Tue Jan 25 20:04:13 2022
For a virtual machine deployment on a 1GigE network, I get roughly 2.2GiBps reads, which again, if you think about it, is not bad at all. Imagine this on a much larger configuration with better compute, storage, and network capabilities; more processes per client; and more clients. This cluster would scream with speed.
Conclusion
That is the Lustre high-performance filesystem in a nutshell. To unmount the filesystem from the client, use the umountcommand, just like you would unmount any other device from a system:
$ sudo pdsh -w 10.0.0.[3-6] umount /lustre
Much like any other technology, Lustre is not the only distributed filesystem of its kind, including IBM's GPFS, BeeGFS, and plenty more. Either way, and despite the competition, Lustre is both stable and reliable and has cemented itself in the HPC space for nearly two decades; it is not going anywhere.
For Further Reading
[1] The Lustre Project: https://www.lustre.org
[2] The Lustre Project Wiki: https://wiki.lustre.org/Main_Page
[3] The Lustre Documentation: https://doc.lustre.org/lustre_manual.xhtml
[4] The IOR (and MDtest) GitHub Project: https://github.com/hpc/ior
About the Author
Petros Koutoupis is currently a senior performance software engineer at Cray (now HPE) for its Lustre High Performance File System division. He is also the creator and maintainer of the RapidDisk Project (www.rapiddisk.org). Petros has worked in the data storage industry for well over a decade and has helped to pioneer the many technologies unleashed in the wild today.