Building a HPC cluster with Warewulf 4
Time and Resource Management
A Warewulf-configured cluster head node with bootable, stateless compute nodes [1] is a first step in building a cluster. Although you can run jobs at this point, some additions need to be made to make it more functional. In this article, I'll show you how to configure time so that the head node and the compute node are in sync. This step is more important than some people realize. For example, I have seen Message Passing Interface (MPI) applications that have failed because the clocks on two of the nodes were far out of sync.
Next, you will want to install a resource manager (job scheduler) that allows you to queue up jobs, so you don't have to sit at the terminal waiting for jobs to finish. In this example, I use Slurm. You can also share the cluster with other users and have jobs run when the resources you need are available. This component is a key to creating an HPC cluster.
NTP
One of the absolute key tools for clusters is the Network Time Protocol (NTP), which syncs the system clocks either to each other, or to a standard atomic clock (or close to it), or both. With clocks in sync, the many tools and libraries on clusters such as MPI will function correctly.
On Rocky Linux 8, I use chrony
[2] to sync clocks on both the client and the server. In the case of the cluster, the head node is a client to the outside world, but it will also act as a time server to the compute nodes within the cluster.
Installing chrony
on the head node with yum
or dnf
is easy. During installation, a default /etc/chrony.conf
configuration file is created, but I modified mine to keep it really simple:
server 2.rocky.pool.ntp.org driftfile /var/lib/chrony/drift makestep 1.0 3 rtcsync allow 10.0.0.0/8 local stratum 10 keyfile /etc/chrony.keys leapsectz right/UTC logdir /var/log/chrony
I pointed the head node to 2.rocky.pool.ntp.org
as the source of time updates in the outside world (this came with the default). I also allowed the head node to be used by the IP addresses in the range 10.0.0.0/8.
After you edit the file, you should restart the chrony
service:
$ sudo systemctl restart chronyd
I also like to make sure it will start automatically on boot, so I run:
$ sudo systemctl enable chronyd
Although it is probably not necessary, because it was enabled when installed, I like to be sure. I usually also check that it's running with systemctl
when I restart the head node.
At this point you can test whether the clock is synchronized by installing the ntpstat
utility on the head node and then running it:
$ sudo yum install ntpstat $ ntpstat synchronised to NTP server (162.159.200.123) at stratum 4 time correct to within 21 ms polling server every 64 s
Your output will not match this exactly, but you can see that it's using an outside source to synchronize the clock.
Configuring time on the compute node is a bit different from the head node, requiring a few more steps. The first difference is that the compute node has no time zone associated with it, and I like to keep the compute nodes as close as possible to the head node. If you try to set a time zone in the container, it won't work because the container is not running. You can either set the time zone manually on the compute node before running any jobs, or you can create a simple systemd script that runs on startup. I'm going to choose the second approach to automate things.
To create a simple script that is run by the system when the node starts but after the network is up, you should put the script where changes local to the node should reside: in /usr/local/bin
. Begin by exec
-ing into the container:
$ sudo wwctl container exec rocky-8 /bin/bash [rocky-8] Warewulf>
Next, create a script in /usr/local/bin/
(I named mine timezone_fix.sh
).
#!/bin/bash timedatectl set-timezone America/New_York
Adjust the time zone value for your cluster. (You can find the time zone of your head node with the command timedatectl
.) In my case, it is America/New_York
. The command timedatectl set-timezone
sets the time zone. Be sure to make the script executable:
[rocky-8] Warewulf> chmod u+x /usr/local/bin/timezone_fix.sh
After creating that script, create the systemd service that runs it in file /etc/systemd/system/run-at-startup
.service
, so the system knows about the script:
[Unit] Description=Script to set time zone to EDT [Service] Type=simple RemainAfterExit=no ExecStart=/usr/local/bin/timezone_fix.sh TimeoutStartSec=0 [Install] WantedBy=default.target
The final step is to enable the run-at-startup
service:
[rocky-8] Warewulf> systemctl enable run-at-startup.service Created symlink /etc/systemd/system/default.target.wants/run-at-startup.service /etc/systemd/system/run-at-startup.service
With these additions, the time zone in the compute node will match the head node. Again, I don't think it's strictly required, but I like to have it. Now you can install chrony
into the compute node container, as you did for the head node:
$ yum install chrony ntpstat
The /etc/chrony.conf
file for compute node is similar to the head node:
server 10.0.0.1 driftfile /var/lib/chrony/drift makestep 1.0 3 rtcsync allow 10.0.0.0/8 local stratum 10 keyfile /etc/chrony.keys leapsectz right/UTC logdir /var/log/chrony
For the compute node, I just point chrony
to the head node (warewulf
or IP 10.0.0.1) as the source of time (time server) because it is pointed to an outside NTP server. Strictly speaking, I think you really only need the head node and compute nodes to be in sync, but you might as well sync the head node to the true time outside of the cluster.
To make sure chrony
starts when the container boots, I enable the service inside the container,
$ systemctl enable chronyd
and type exit
to leave the container. Be sure that it rebuilds the container when you exit; otherwise, you have to go back and redo everything. To make sure NTP is working, boot the compute node and run timedatectl
(Listing 1).
Listing 1
Running timedatectl
$ ssh n0001 [laytonjb@n0001 ~]$ timedatectl Local time: Sat 2022-12-17 11:31:26 EST Universal time: Sat 2022-12-17 16:31:26 UTC RTC time: Sat 2022-12-17 16:31:26 Time zone: America/New_York (EST, -0500) System clock synchronized: yes NTP service: active RTC in local TZ: no
Everything looks good at this point. The time zone matches the head node, and the node is time syncing. Another thing to look at is the ntpstat
output:
$ ntpstat synchronised to NTP server (10.0.0.1) at stratum 5 time correct to within 48 ms polling server every 64 s
The NTP server is correct and all looks good. On to the next step!
Slurm
Now that time is synchronized between the head node and compute nodes, I like to install the resource manager (aka, the job scheduler). I chose Slurm for my cluster because it is so ubiquitous, but you have several others from which to choose.
I must admit that I had a difficult time getting Slurm to run by my installation method. Although my method could be the problem, perhaps not. (I'm sure it was my fault, though.) Regardless, with the help of several people on the mailing lists, I got it running.
The process I followed is in a recipe by Steve Jones from Stanford University. He has a nice recipe for his system that creates nodes with virtual machines (VMs) [3] that can be used as a template for physical nodes. I didn't use the entire recipe, only those parts near the end that applied to installing and configuring Slurm.
His recipe uses the Slurm RPMs from OpenHPC [4], which I like using for several reasons: They are cluster oriented; OpenHPC will be switching to Warewulf 4 soon, so they have preliminary binaries; and I didn't have to build Slurm from scratch. The first step in using these RPMs is to add the OpenHPC repository. After a little hunting I found the release file and installed it on the head node:
$ sudo yum install http://repos.openhpc.community/OpenHPC/2/EL_8/x86_64/ohpc-release-2-1.el8.x86_64.rpm
After installing the release RPM, I installed munge
, the Slurm authentication tool (Listing 2), then I installed the Slurm server on the head node (Listing 3).
Listing 2
Installing munge
$ sudo yum install munge Last metadata expiration check: 0:03:14 ago on Sun 04 Dec 2022 08:29:36 AM EST. Dependencies resolved. ================================================================================================ Package Architecture Version Repository Size ================================================================================================ Installing: munge x86_64 0.5.13-2.el8 appstream 121 k Installing dependencies: munge-libs x86_64 0.5.13-2.el8 appstream 29 k Transaction Summary ================================================================================================ Install 2 Packages ...
Listing 3
Install Slurm Server
$ sudo yum install ohpc-slurm-server Last metadata expiration check: 0:22:28 ago on Sun 04 Dec 2022 08:29:36 AM EST. Dependencies resolved. =============================================================================================== Package Arch Version Repository Size =============================================================================================== Installing: ohpc-slurm-server x86_64 2.6-7.1.ohpc.2.6 OpenHPC-updates 7.0 k Installing dependencies: mariadb-connector-c x86_64 3.1.11-2.el8_3 appstream 199 k mariadb-connector-c-config noarch 3.1.11-2.el8_3 appstream 14 k ohpc-filesystem noarch 2.6-2.3.ohpc.2.6 OpenHPC-updates 8.0 k pdsh-mod-slurm-ohpc x86_64 2.34-9.1.ohpc.2.6 OpenHPC-updates 13 k slurm-devel-ohpc x86_64 22.05.2-14.1.ohpc.2.6 OpenHPC-updates 83 k slurm-example-configs-ohpc x86_64 22.05.2-14.1.ohpc.2.6 OpenHPC-updates 242 k slurm-ohpc x86_64 22.05.2-14.1.ohpc.2.6 OpenHPC-updates 18 M slurm-perlapi-ohpc x86_64 22.05.2-14.1.ohpc.2.6 OpenHPC-updates 822 k slurm-slurmctld-ohpc x86_64 22.05.2-14.1.ohpc.2.6 OpenHPC-updates 1.5 M slurm-slurmdbd-ohpc x86_64 22.05.2-14.1.ohpc.2.6 OpenHPC-updates 836 k Transaction Summary =============================================================================================== Install 11 Packages ...
Everything should go fine through this step. If you have hiccups, I recommend posting to the slurm-users mailing list, the warewulf mailing list, or both.
Next, you need to create and edit the slurm.conf
file. Some files in /etc/slurm/
are part of the Slurm server installation. You will use these templated files later.
For now, use the slurm.conf.example
template file:
$ sudo cp /etc/slurm/slurm.conf.ohpc /etc/slurm/slurm.conf
Jones's recipe uses some Perl commands to edit that file which is used on the head node (the Slurm server) (Listing 4). These commands are fairly easy to understand, even if you don't know Perl. On the second and third lines I changed the name of the compute node to match my node (n0001).
Listing 4
Edit the Template File
$ sudo perl -pi -e "s/ControlMachine=\S+/ControlMachine=`hostname -s`/" /etc/slurm/slurm.conf $ sudo perl -pi -e "s/^NodeName=(\S+)/NodeName=n0001/" /etc/slurm/slurm.conf $ sudo perl -pi -e "s/^PartitionName=normal Nodes=(\S+)/PartitionName=normal Nodes=n0001/" /etc/slurm/slurm.conf $ sudo perl -pi -e "s/ Nodes=c\S+ / Nodes=ALL /" /etc/slurm/slurm.conf $ sudo perl -pi -e "s/ReturnToService=1/ReturnToService=2/" /etc/slurm/slurm.conf
You should also set the munge
and slurmctld
services to start when the head node boots:
$ sudo systemctl enable --now munge $ sudo systemctl enable --now slurmctld
A few more modifications need to be made on the Slurm head node. Edit the lines in the /etc/slurm/slurm.conf
file as follows:
... SlurmctldAddr=10.0.0.1 ... SlurmctldLogFile=/var/log/slurm/slurmctld.log ... SlurmLogFile=/var/log/slurm/slurmd.log ...
The first line points to the head node's IP address. The other two lines tell Slurm where to write the logs. If the log directory doesn't exist, you will have to create it and chown
it to slurm:slurm
:
$ sudo mkdir /var/log/slurm $ sudo chown slurm:slurm /var/log/slurm
Another edit you will have to make in /etc/slurm/slurm.conf
is to the line that begins NodeName=
. It should reflect the node name, the number of sockets, the number of cores per socket, and the number of threads per core. For me, this line is
NodeName=n0001 Sockets=1 CoresPerSocket=4 ThreadsPerCore=2 State=UNKNOWN
for my compute node because I have a single socket with a four-core processor that has hyperthreading turned on (two threads per core). You should change this line to reflect your compute node.
It's also good to check that the directory /var/lib/munge
exists (it should). The owner should be munge:munge
on the directory and any files in there. If the directory doesn't exist, that is a problem and you need to create it, chown
it to munge:munge
, and reinstall the OpenHPC Slurm server RPM. If you see the file /var/lib/munge/munge.seed
and it is owned by munge:munge
, you should be good.
Also check that the directory /var/spool/slurmctld
exists and is owned by slurm:slurm.
A number of files in that directory should also be owned by slurm:slurm
. If you don't see the directory or the files, create the directory, chown
it to slurm:slurm
, and reinstall the OpenHPC Slurm server RPM.
The next thing to check is for the existence of the file /etc/slurm/cgroup.conf
. If the file doesn't exist, there might be a file named cgroup.conf.example
in that same directory. If so, copy it to cgroup.conf
:
$ sudo cp /etc/slurm/cgroup.conf.example /etc/slurm/cgroup.conf
Now add a single line to the end of /etc/slurm/cgroup.conf
so the file looks like Listing 5. This last line is what stumped me for a while until Jason Stover helped (thanks Jason!).
Listing 5
Modification for /etc/slurm/cgroup.conf
$ sudo more /etc/slurm/cgroup.conf ### # # Slurm cgroup support configuration file # # See man slurm.conf and man cgroup.conf for further # information on cgroup configuration parameters #-- CgroupAutomount=yes ConstrainCores=no ConstrainRAMSpace=no CgroupMountpoint=/sys/fs/cgroup
I realize this seems like quite a bit of fiddling, but this is what I had to do to get Slurm to work, and it's not bad because it only needs to be done once. You can choose to build and install Slurm yourself or use different RPMs.
Now I come to the fun part, the compute node, which is a bit different from the head node and requires maybe a little more fiddling around; in reality, you only have to do this once per container. You can even script this if you like, especially if you are going to use several containers.
The first step is to create the users slurm
and munge
in the container along with their groups before installing anything. This part is very critical: the user ID (UID) and group ID (GID) of the slurm
and munge
users and groups in the container must match those on the head node. On my head node, the group entries for slurm
and munge
are:
munge:x:970: slurm:x:202:
The entries for the corresponding users are:
munge:x:972:970:Runs Uid 'N' Gid Emporium:/var/run/munge:/sbin/nologin slurm:x:202:202:SLURM resource manager:/etc/slurm:/sbin/nologin
Write down the GIDs and names and UIDs and names, and then exec
into the container and mount the host filesystem in the container. Once in the container, you can create the appropriate groups and users (Listing 6).
Listing 6
Mount the Host Filesystem
$ sudo wwctl container exec --bind /:/mnt rocky-8 /bin/bash [rocky-8] Warewulf> groupadd -g 970 munge [rocky-8] Warewulf> groupadd -g 202 slurm [rocky-8] Warewulf> useradd -g 970 -u 972 munge [rocky-8] Warewulf> useradd -g 202 -u 202 slurm
Be sure to check these against the head node just to be sure. This is a very important step. Don't exit from the container just yet. You need to install the OpenHPC release RPM to use their RPMs:
[rocky-8] Warewulf> yum install http://repos.openhpc.community/OpenHPC/2/EL_8/x86_64/ohpc-release-2-1.el8.x86_64.rpm
Next you can install the OpenHPC Slurm client (Listing 7). Like the head node, you now need to fix up the Slurm client installation.
Listing 7
Install OpenHPC Slurm Client
[rocky-8] Warewulf> yum install ohpc-slurm-client Failed to set locale, defaulting to C.UTF-8 Last metadata expiration check: 0:04:01 ago on Sun Dec 11 14:26:55 2022. Dependencies resolved. =============================================================================================== Package Arch Version Repository Size =============================================================================================== Installing: ohpc-slurm-client x86_64 2.6-7.1.ohpc.2.6 OpenHPC-updates 6.9 k Installing dependencies: cairo x86_64 1.15.12-6.el8 appstream 718 k dejavu-fonts-common noarch 2.35-7.el8 baseos 73 k dejavu-sans-fonts noarch 2.35-7.el8 baseos 1.5 M fontconfig x86_64 2.13.1-4.el8 baseos 273 k fontpackages-filesystem noarch 1.44-22.el8 baseos 15 k freetype x86_64 2.9.1-9.el8 baseos 393 k groff-base x86_64 1.22.3-18.el8 baseos 1.0 M hwloc-ohpc x86_64 2.7.0-3.9.ohpc.2.6 OpenHPC-updates 2.6 M libX11 x86_64 1.6.8-5.el8 appstream 610 k libX11-common noarch 1.6.8-5.el8 appstream 157 k libXau x86_64 1.0.9-3.el8 appstream 36 k libXext x86_64 1.3.4-1.el8 appstream 44 k libXrender x86_64 0.9.10-7.el8 appstream 32 k libpng x86_64 2:1.6.34-5.el8 baseos 125 k libxcb x86_64 1.13.1-1.el8 appstream 228 k mariadb-connector-c x86_64 3.1.11-2.el8_3 appstream 199 k mariadb-connector-c-config noarch 3.1.11-2.el8_3 appstream 14 k munge x86_64 0.5.13-2.el8 appstream 121 k munge-libs x86_64 0.5.13-2.el8 appstream 29 k numactl-libs x86_64 2.0.12-13.el8 baseos 35 k ohpc-filesystem noarch 2.6-2.3.ohpc.2.6 OpenHPC-updates 8.0 k perl-Carp noarch 1.42-396.el8 baseos 29 k perl-Data-Dumper x86_64 2.167-399.el8 baseos 57 k perl-Digest noarch 1.17-395.el8 appstream 26 k perl-Digest-MD5 x86_64 2.55-396.el8 appstream 36 k perl-Encode x86_64 4:2.97-3.el8 baseos 1.5 M perl-Errno x86_64 1.28-421.el8 baseos 75 k perl-Exporter noarch 5.72-396.el8 baseos 33 k perl-File-Path noarch 2.15-2.el8 baseos 37 k perl-File-Temp noarch 0.230.600-1.el8 baseos 62 k perl-Getopt-Long noarch 1:2.50-4.el8 baseos 62 k perl-HTTP-Tiny noarch 0.074-1.el8 baseos 57 k perl-IO x86_64 1.38-421.el8 baseos 141 k perl-MIME-Base64 x86_64 3.15-396.el8 baseos 30 k perl-Net-SSLeay x86_64 1.88-2.module+el8.6.0+957+15d660ad appstream 378 k perl-PathTools x86_64 3.74-1.el8 baseos 89 k perl-Pod-Escapes noarch 1:1.07-395.el8 baseos 19 k perl-Pod-Perldoc noarch 3.28-396.el8 baseos 85 k perl-Pod-Simple noarch 1:3.35-395.el8 baseos 212 k perl-Pod-Usage noarch 4:1.69-395.el8 baseos 33 k perl-Scalar-List-Utils x86_64 3:1.49-2.el8 baseos 67 k perl-Socket x86_64 4:2.027-3.el8 baseos 58 k perl-Storable x86_64 1:3.11-3.el8 baseos 97 k perl-Term-ANSIColor noarch 4.06-396.el8 baseos 45 k perl-Term-Cap noarch 1.17-395.el8 baseos 22 k perl-Text-ParseWords noarch 3.30-395.el8 baseos 17 k perl-Text-Tabs+Wrap noarch 2013.0523-395.el8 baseos 23 k perl-Time-Local noarch 1:1.280-1.el8 baseos 32 k perl-URI noarch 1.73-3.el8 appstream 115 k perl-Unicode-Normalize x86_64 1.25-396.el8 baseos 81 k perl-constant noarch 1.33-396.el8 baseos 24 k perl-interpreter x86_64 4:5.26.3-421.el8 baseos 6.3 M perl-libnet noarch 3.11-3.el8 appstream 120 k perl-libs x86_64 4:5.26.3-421.el8 baseos 1.6 M perl-macros x86_64 4:5.26.3-421.el8 baseos 71 k perl-parent noarch 1:0.237-1.el8 baseos 19 k perl-podlators noarch 4.11-1.el8 baseos 117 k perl-threads x86_64 1:2.21-2.el8 baseos 60 k perl-threads-shared x86_64 1.58-2.el8 baseos 47 k pixman x86_64 0.38.4-2.el8 appstream 256 k slurm-contribs-ohpc x86_64 22.05.2-14.1.ohpc.2.6 OpenHPC-updates 22 k slurm-example-configs-ohpc x86_64 22.05.2-14.1.ohpc.2.6 OpenHPC-updates 242 k slurm-ohpc x86_64 22.05.2-14.1.ohpc.2.6 OpenHPC-updates 18 M slurm-pam_slurm-ohpc x86_64 22.05.2-14.1.ohpc.2.6 OpenHPC-updates 172 k slurm-slurmd-ohpc x86_64 22.05.2-14.1.ohpc.2.6 OpenHPC-updates 767 k Installing weak dependencies: perl-IO-Socket-IP noarch 0.39-5.el8 appstream 46 k perl-IO-Socket-SSL noarch 2.066-4.module+el8.6.0+957+15d660ad appstream 297 k perl-Mozilla-CA noarch 20160104-7.module+el8.6.0+965+850557f9 appstream 14 k Transaction Summary =============================================================================================== Install 69 Packages Total download size: 40 M Installed size: 144 M Is this ok [y/N]: Downloading Packages: ...
To begin, edit the munge
configuration. In no specific order, begin by checking that the directory /etc/munge
exists. If not, create it and chown
it to munge:munge
:
[rocky-8] Warewulf> mkdir /etc/munge [rocky-8] Warewulf> chown munge:munge /etc/munge
Copy the file /etc/munge/munge.key
from the head node to the container, which won't be difficult because you mounted the head node filesystem when you exec
'd into the container:
[rocky-8] Warewulf> cp /mnt/etc/munge/munge.key /etc/munge/munge.key cp: overwrite '/etc/munge/munge.key'? y [rocky-8] Warewulf> ls -lstar /etc/munge/munge.key 4 -r-------- 1 munge munge 1024 Dec 11 14:46 /etc/munge/munge.key
If it asks you whether you want to overwrite the existing munge.key
, choose y
. Be sure the directory and the file are all owned by munge:munge
(UID:GID).
Next, you need to chown
the directory /var/lib/munge
to munge:munge
:
[rocky-8] Warewulf> chown munge:munge /var/lib/munge
Now turn your attention to configuring Slurm in the container by creating the directory /var/spool/slurmd
and chown
it to slurm:slurm
:
[rocky-8] Warewulf> mkdir /var/spool/slurmd [rocky-8] Warewulf> chown slurm:slurm /var/spool/slurmd
Notice that the directory for the client is slurmd
and not slurmctld
, which is for the Slurm server.
Next, copy the slurm.conf
file from the host node:
[rocky-8] Warewulf> cp /mnt/etc/slurm/slurm.conf /etc/slurm/slurm.conf
You shouldn't have to change anything in the file /etc/slurm/slurm.conf
once it is in the container.
Next, create the log directory for Slurm in the container and chown
it to slurm:slurm
:
[rocky-8] Warewulf> mkdir /var/log/slurm [rocky-8] Warewulf> chown slurm:slurm /var/log/slurm
One other thing you should use from Jones' recipe is the following command inside the container that sets options for starting Slurm:
echo SLURMD_OPTIONS="--conf-server `hostname -s`" > /etc/sysconfig/slurmd
The result can be incorrect. To be sure, edit the file /etc/sysconfig/slurmd
. The file for my setup, where the Slurm server head node is named warewulf
, should be:
SLURMD_OPTIONS=--conf-server warewulf
Be sure it points to the Slurm server.
Finally, enable the services in the container for munge
and slurm
:
systemctl enable munge systemctl enable slurmd
Note that slurmd
is now correct and is not slurmctld
as on the head node.
At this point, you can exit the container, which should save, before starting or restarting the compute node.
Once the compute node is rebooted, either SSH into it or log in directly. Check that the munge
and slurmd
services are running, then check the munge
and slurmd
details, particularly the permissions for:
/var/munge
/var/munge/munge.key
/etc/slurm
/etc/slurm/slurm.conf
/var/log/slurm
/var/spool/slurm
/etc/sysconfig/slurmd
If you see any issues with permissions, check /etc/group
and /etc/passwd
on the compute node and compare them with the head node. To correct differences, edit these two files in the container on the host node, exit the container so the updates are saved, and reboot the compute node.
If everything appears correct, go to the head node and run the command shown in Listing 8, which lists the nodes known by slurm
. The node is idle
, so it is ready to run jobs.
Listing 8
List of Nodes Known by slurm
$ sinfo -a PARTITION AVAIL TIMELIMIT NODES STATE NODELIST normal* up 1-00:00:00 1 idle n0001
The first job I like to run is a single line that checks the hostnames of all nodes:
$ srun -n1 -l hostname [output]
If this works, it is time to move on to something more sophisticated and more like running an HPC job. I'll create a simple script that does nothing but sleep for 40 seconds:
#!/bin/bash date sleep 40 hostname -s date
I called this script stall.sh
because it does nothing but "stall" until the script finishes. You can think of this as the application script.
The second script contains the slurm
options and srun
script:
#!/bin/bash #SBATCH --job=test_stall_job #SBATCH --nodes=1 #SBATCH --output=test_stall_%j.log srun /home/laytonjb/stall.sh
I don't want to go into too much detail, but the first line that begins SBATCH
defines the job name, the second line defines how many nodes to use (just one), and the third SBATCH
defines the name of the file where the output is written. I called this file run_stall.sh
.
When I submit the job to Slurm, Listing 9 shows the job queue, and the nodes known by slurm
.
Listing 9
Job Queue and Known Nodes
$ sbatch run_stall.sh Submitted batch job 20 [laytonjb@warewulf ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 20 normal test_sta laytonjb R 0:01 1 n0001 [laytonjb@warewulf ~]$ sinfo -a PARTITION AVAIL TIMELIMIT NODES STATE NODELIST normal* up 1-00:00:00 1 alloc n0001
The batch job id (20
) is printed to the command line after running the sbatch
command. The output of squeue
shows the job id, the slurm
partition (called normal
), the user who submitted the job, the status (ST), the time it's been running, and a list of the nodes that are being used under the label NODELIST
.
When you run the sinfo -a
command, it shows that the node is allocated (alloc
), which means it is being allocated to a job.
Summary
Building on the last article [1], NTP was added to the Warewulf 4 cluster for precise time keeping, which is critical for all subsequent steps in building the cluster. The time zone to the compute node (container) was added, as well, to match the head node. I'm not entirely sure if this is necessary, but it is something I like to do to make sure all the nodes have the same time zone.
The last thing added in this article was a resource manager (job scheduler) – in this case, Slurm. The installation approach I chose may be a bit fiddly, but it gets the job done. If you have another method that works for you, please use it.
On the head node you must configure the /etc/slurm/slurm.conf
file, for which Jones' recipe has the commands. You also need to configure /etc/slurm/cgroup.conf
by adding the one line at the end.
Moving to the compute node, the key to getting Slurm to work is to make sure the compute node, really the container, has the same GIDs and UIDs as the head node. It is also recommended you do this before you install the Slurm client into the container. Once this is done, installation is fairly straightforward.
Next, exec
into the container, mounting the head node filesystem. From there, you will likely have to make a few directories and chow
n them to either munge:munge
or slurm:slurm
. Then, copy over /etc/slurm/slurm.conf
and /etc/munge/munge.key
from the head node to the container. Be sure these are owned by the correct UID:GID.
Remember that the compute node runs slurmd
, whereas the head node runs slurmctld
. Once this is done, you can start the compute node and run test jobs.
Other additions to your cluster could include installing environment modules with compilers and libraries or setting up GPUs on the head and compute nodes and configuring them in Slurm as a consumable resource [5].
Infos
- Nodes: https://www.admin-magazine.com/HPC/Articles/Warewulf-4
- chrony: https://en.wikipedia.org/wiki/Chrony
- Recipe for creating nodes with VMs: https://github.com/stanfordhpccenter/OpenHPC/blob/main/hpc-for-the-rest-of-us/recipes/rocky8/warewulf4/slurm/recipe.sh
- OpenHPC Slurm RPMs: https://openhpc.community
- Slurm consumable resource: https://www.admin-magazine.com/HPC/Articles/Warewulf-4-GPUs
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.