Lustre HPC distributed filesystem

Radiance

Preparing the Metadata Servers

You now have Lustre builds for both the clients and servers. I will now switch the focus to use those builds to configure both. Although a separate node could have been used to host the management service (i.e., the MGS), instead, I opted to use the first MDS hosting the first MDT as the management service. To do this, add the --mgs option when formatting the device for Lustre. A Lustre deployment can host one, 64, or more MDT devices. However, in this example, I will format just one (Listing 5). If you do choose to format additional MDTs, be sure to increment the value of the index parameter by one each time and specify the node ID (NID) for the MGS node with --mgsnode=<NID> (shown in the "Preparing the Object Storage Servers" section).

Listing 5

Formatting the MDT

$ sudo mkfs.lustre --fsname=testfs --index=0 --mgs --mdt /dev/sdb
   Permanent disk data:
Target:     testfs:MDT0000
Index:      0
Lustre FS:  testfs
Mount type: ldiskfs
Flags:      0x65
              (MDT MGS first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:
checking for existing Lustre data: not found
device size = 48128MB
formatting backing filesystem ldiskfs on /dev/sdb
        target name   testfs:MDT0000
        kilobytes     49283072
        options       -I 512 -i 1024 -J size=1925 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,project,huge_file,ea_inode,large_dir,flex_bg -E lazy_journal_init="0",lazy_itable_init="0" -F
mkfs_cmd = mke2fs -j -b 4096 -L testfs:MDT0000 -I 512 -i 1024 -J size=1925 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,project,huge_file,ea_inode,large_dir,flex_bg -E lazy_journal_init="0",lazy_itable_init="0" -F /dev/sdb 49283072k
Writing CONFIGS/mountdata

Now create a mountpoint to host the MDT and then mount it:

$ sudo mkdir /mnt/mdt
$ sudo mount -t lustre /dev/sdb /mnt/mdt/

Because I am not using LDAP and just trusting my clients (and their users) for this example, I need to execute the following on the same MGS node:

$ lctl set_param mdt.*.identity_upcall=NONE

Note that the above command should NOT be deployed in production because it could potentially lead to security concerns and issues.

Make note of the management server's IP address (Listing 6). This output will be the Lustre Networking (LNET) NID, which can be verified by:

$ sudo lctl list_nids
10.0.0.2@tcp

Listing 6

Management Server

$ sudo ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1460
        inet 10.0.0.2  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::bfd3:1a4b:f76b:872a  prefixlen 64  scopeid 0x20<link>
        ether 42:01:0a:80:00:02  txqueuelen 1000  (Ethernet)
        RX packets 11919  bytes 61663030 (58.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 10455  bytes 973590 (950.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

LNET is Lustre's network communication protocol, which is designed to be lightweight and efficient. It supports message passing for remote procedure call (RPC) request processes and remote direct memory access (RDMA) for bulk data movement. All metadata and file data I/O are managed through LNET.

Preparing the Object Storage Servers

On the next server, I format the secondary storage volume to be the first OST with an index of 0, while pointing to the MGS node: --mgsnode=10.0.0.2@tcp0 (Listing 7). Then, I create a mountpoint to host the OST and mount it:

$ sudo mkdir /mnt/ost
$ sudo mount -t lustre /dev/sdb /mnt/ost/

Listing 7

Format the OST

$ sudo mkfs.lustre --reformat --index=0 --fsname=testfs --ost --mgsnode=10.0.0.2@tcp0 /dev/sdb
   Permanent disk data:
Target:     testfs:OST0000
Index:      0
Lustre FS:  testfs
Mount type: ldiskfs
Flags:      0x62
              (OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.0.0.2@tcp
device size = 48128MB
formatting backing filesystem ldiskfs on /dev/sdb
        target name   testfs:OST0000
        kilobytes     49283072
        options       -I 512 -i 1024 -J size=1024 -q -O extents,uninit_bg,dir_nlink,quota,project,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F
mkfs_cmd = mke2fs -j -b 4096 -L testfs:OST0000 -I 512 -i 1024 -J size=1024 -q -O extents,uninit_bg,dir_nlink,quota,project,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F /dev/sdb 49283072k
Writing CONFIGS/mountdata

On the rest of the nodes I follow the same procedure, again, by incrementing the index parameter value by one each time (Listing 8). Be sure to create the local mountpoint to host the OST and then mount it.

Listing 8

The Rest of the Nodes

$ sudo mkfs.lustre --reformat --index=1 --fsname=testfs --ost --mgsnode=10.0.0.2@tcp0 /dev/sdb
   Permanent disk data:
Target:     testfs:OST0001
Index:      1
Lustre FS:  testfs
Mount type: ldiskfs
Flags:      0x62
[ ... ]

Using the Clients

To mount the filesystem on a client, you need to specify the filesystem type, the NID of the MGS, the filesystem's name, and the mountpoint on which to mount it. The template for the command and the command I used are:

mount -t lustre <MGS NID>:/<fsname> <mountpoint>
mount -t lustre 10.0.0.2@tcp0:/testfs /lustre

In the examples below, I will be relying on pdsh to run commands on multiple remote hosts simultaneously. All four clients will need a local directory to mount the remote filesystem,

$ sudo pdsh -w 10.0.0.[3-6] mkdir -pv /lustre

after which, you can mount the remote filesystem on all clients:

$ sudo pdsh -w 10.0.0.[3-6] mount -t lustre 10.0.0.2@tcp0:/testfs /lustre

Each client now has access to the remote Lustre filesystem. The filesystem is currently empty:

$ sudo ls /lustre/
$ **

As a quick test, create an empty file and verify that it has been created:

$ sudo touch /lustre/test.txt
$ sudo ls /lustre/
test.txt

All four clients should be able to see the same file:

$ sudo pdsh -w 10.0.0.[3-6] ls /lustre
10.0.0.3: test.txt
10.0.0.5: test.txt
10.0.0.6: test.txt
10.0.0.4: test.txt

You can clean up the output so that you do not see the same instance repeated over and over again:

$ sudo pdsh -w 10.0.0.[3-6] ls /lustre | dshbak -c
----------------
10.0.0.[3-6]
----------------
test.txt

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus