Photo by Efe Kurnaz on Unsplash

Photo by Efe Kurnaz on Unsplash

Lustre HPC distributed filesystem

Radiance

Article from ADMIN 68/2022
By
The Lustre open source distributed, parallel filesystem scales to high-performance computing environments.

What do you do when you need to deploy a large filesystem that is scalable to the exabyte level and supports a large-client, simultaneous-access workload? You find a parallel distributed filesystem such as Lustre. In this article, I build the high-performance Lustre filesystem from source, install it on multiple machines, mount it from clients, and access them in parallel.

Lustre Filesystems

A distributed filesystem allows access to files from multiple hosts sharing the files within a computer network, which makes it possible for multiple users on multiple client machines to share files and storage resources. The client machines do not have direct access to the underlying block storage sharing those files; instead, they communicate with a set or cluster of server machines hosting those files and the filesystem to which they are written.

Lustre (or Linux Cluster) [1]-[3] is one such distributed filesystem, usually deployed for large-scale cluster high-performance computing (HPC). Licensed under the GNU General Public License (GPL), Lustre provides a solution in which high performance and scalability to tens of thousands of nodes (including the clients) and exabytes of storage becomes a reality and is relatively simple to deploy and configure. As of this writing, the Lustre project is at version 2.14, nearing the official release of 2.15 (currently under development), which will be the next long-term support (LTS) release.

Lustre contains somewhat of a unique architecture, with four major functional units: (1) a single Management Service (MGS), which can be hosted on its own machine or on one of the metadata machines; (2) the Metadata Service (MDS), which contains Metadata Targets (MDTs); (3) Object Storage Services (OSS), which store file data on one or more Object Storage Targets (OSTs); and (4) the clients that access and use the file data.

For each Lustre filesystem, MDTs store namespace metadata, which include file names, directories, access permissions, and file layouts. The MDT data is stored in a single-disk dedicated filesystem that maps locally to the serving node, controls file access, and informs the client nodes as to which objects make up a file. One or more MDS nodes can exist on a single Lustre filesystem with one or more MDTs each.

An OST is a dedicated object-based filesystem exported for read and write operations. The capacity of a Lustre filesystem is determined by the sum of the total capacities of the OSTs.

Lustre presents all clients with a unified namespace for all of the files and data in the filesystem, which allows concurrent and coherent read and write access to the files in the filesystem. When a client accesses a file, it completes a file name lookup on the MDS, and either a new file is created or the layout of an existing file is returned to the client.

Locking the file on the OST, the client then runs one or more read or write operations to the file but does not directly modify the objects on the OST. Instead, it delegates tasks to the OSS. This approach ensures scalability and improved security and reliability, because it does not allow direct access to the underlying storage, which would increase the risk of filesystem corruption from misbehaving or defective clients.

Although all four components (MGS, MDT, OST, and client) can run on the same node, they are typically configured on separate nodes communicating over a network.

Prerequisites

In this article, I use eight nodes, four of which will be configured as client machines and the rest as the servers hosting the Lustre filesystem. Although not required, all eight systems will run CentOS 8.5.2111. As the names imply, the servers will host the target Lustre filesystem; the clients will not only mount it, but also write to it.

For the configuration, you need to build the filesystem packages for both the clients and the servers, which means you will need to install package dependencies from the package repositories:

$ sudo dnf install wget git make gcc kernel-devel epel-release automake binutils libtool bison byacc kernel-headers elfutils-libelf-devel elfutils-libelf kernel-rpm-macros kernel-abi-whitelists keyutils-libs keyutils-libs-devel libnl3 libnl3-devel rpm-build libselinux-devel

Next, enable the powertools repository and install a couple of packages:

$ sudo dnf config-manager --set-enabled powertools
$ sudo dnf install dkms libyaml-devel

To build Lustre from source, you need to grab the updated e2fsprogs packages for your respective distribution and version hosted on the Whamcloud project website [4]. In this case, I downloaded and installed the necessary packages for my system [5], shown in Table 1.

Table 1

e2fsprogs Packages

e2fsprogs-1.46.2.wc4-0.el8.x86_64.rpm
e2fsprogs-devel-1.46.2.wc4-0.el8.x86_64.rpm
e2fsprogs-libs-1.46.2.wc4-0.el8.x86_64.rpm
libcom_err-1.46.2.wc4-0.el8.x86_64.rpm
libcom_err-devel-1.46.2.wc4-0.el8.x86_64.rpm
libss-1.46.2.wc4-0.el8.x86_64.rpm
libss-devel-1.46.2.wc4-0.el8.x86_64.rpm

Next, create an RPM build environment, which will only be used once to grab, install, and extract the source kernel packages:

$ mkdir -p ~/rpmbuild/{BUILD,BUILDROOT,RPMS,SOURCES,SPECS,SRPMS}
$ echo '%_topdir %(echo $HOME)/rpmbuild' > ~/.rpmmacros

The Lustre filesystem relies on a local filesystem to store local objects. The project supports ZFS and a patched version of ext4 called LDISKFS, which I use for the build with the ext4 source from a running kernel. To grab the correct kernel source, you need to make a note of your distribution and its version, as well as the kernel version:

$ cat /etc/redhat-release
CentOS Linux release 8.5.2111
$ uname -r
4.18.0-348.7.1.el8_5.x86_64

This location differs depending on the information output above. Listing 1 shows the commands for my setup to grab the kernel source, install the source RPM, change into the directory containing the source objects, and extract the kernel tarball. The final three lines change to the kernel/fs source directory (which should mostly be empty) of the currently installed kernel source, rename the existing ext4 directory, and copy the extracted ext4 source in the current directory.

Listing 1

Working With the Kernel Source

$ wget https://vault.centos.org/8.5.2111/BaseOS/Source/SPackages/kernel-4.18.0-348.7.1.el8_5.src.rpm
$ sudo rpm -ivh kernel-4.18.0-348.7.1.el8_5.src.rpm
$ cd ~/rpmbuild/SOURCES
$ tar xJf linux-4.18.0-348.7.1.el8_5.tar.xz
$ cd /usr/src/kernels/4.18.0-305.10.2.el8_4.x86_64/fs/
$ sudo mv ext4/ ext4.orig
$ sudo cp -r /home/pkoutoupis/rpmbuild/SOURCES/linux-4.18.0-305.10.2.el8_4/fs/ext4 .

Building Lustre from Source

To begin, check out the Lustre source code in your home directory, change into the source directory, check out the desired branch, and set the version string:

$ cd ~
$ git clone git://git.whamcloud.com/fs/lustre-release.git
$ cd lustre-release
$ git branch
$ git checkout master
$ ./LUSTRE-VERSION-GEN

To build the client packages, enter:

$ sh autogen.sh && ./configure --disable-server && make rpms

When the build completes without error, the RPMs shown in Listing 2 will be listed in the source directory's root.

Listing 2

RPMs After the Server Build

$ ls *.rpm
kmod-lustre-client-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
kmod-lustre-client-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
kmod-lustre-client-tests-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
kmod-lustre-client-tests-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-2.14.56_111_gf8747a8-1.src.rpm
lustre-client-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-client-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-client-debugsource-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-client-devel-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-client-tests-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-client-tests-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-iokit-2.14.56_111_gf8747a8-1.el8.x86_64.rpm

Now you need to install the client packages on the client nodes and verify that the packages and the version have been installed:

$ sudo dnf install {kmod-,}lustre-client-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
$ rpm -qa|grep lustre
kmod-lustre-client-2.14.56_111_gf8747a8-1.el8.x86_64
lustre-client-2.14.56_111_gf8747a8-1.el8.x86_64

To build the server packages enter:

$ sh autogen.sh && ./configure && make rpms

When the build completes, you will find the RPMs in Listing 3 in the root of the source directory.

Listing 3

Source Root RPMs

[centos@ip-172-31-54-176 lustre-release]$ ls *.rpm
kmod-lustre-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
kmod-lustre-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
kmod-lustre-osd-ldiskfs-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
kmod-lustre-osd-ldiskfs-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
kmod-lustre-tests-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
kmod-lustre-tests-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-2.14.56_111_gf8747a8-1.src.rpm
lustre-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-debugsource-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-devel-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-iokit-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-osd-ldiskfs-mount-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-osd-ldiskfs-mount-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-resource-agents-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-tests-2.14.56_111_gf8747a8-1.el8.x86_64.rpm
lustre-tests-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64.rpm

To install the packages on the nodes designated as servers, enter:

$ sudo dnf install *.rpm

Then, verify that the packages and the version have been installed. I installed the packages shown in Listing 4. Before proceeding, please read the "Configuring the Servers" box.

Configuring the Servers

For the sole purpose of convenience, I have deployed virtual machines to host this entire tutorial. I will also be limited to a 1 Gigabit Ethernet (GigE) network. On each of the virtual machines designated to host the Lustre filesystem, a secondary, approximately 50GB drive is attached.

Listing 4

Install Packages on Server Nodes

[centos@ip-172-31-54-176 RPMS]$ rpm -qa|grep lustre
lustre-osd-ldiskfs-mount-2.14.56_111_gf8747a8-1.el8.x86_64
lustre-osd-ldiskfs-mount-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64
kmod-lustre-osd-ldiskfs-2.14.56_111_gf8747a8-1.el8.x86_64
lustre-2.14.56_111_gf8747a8-1.el8.x86_64
lustre-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64
kmod-lustre-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64
kmod-lustre-osd-ldiskfs-debuginfo-2.14.56_111_gf8747a8-1.el8.x86_64
kmod-lustre-2.14.56_111_gf8747a8-1.el8.x86_64
lustre-iokit-2.14.56_111_gf8747a8-1.el8.x86_64
lustre-debugsource-2.14.56_111_gf8747a8-1.el8.x86_64

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus