Photo by Andrew Ly on Unsplash

Photo by Andrew Ly on Unsplash

Building a HPC cluster with Warewulf 4

Time and Resource Management

Article from ADMIN 74/2023
By
Warewulf installed with a compute node is not really an HPC cluster; you need to ensure precise time keeping and add a resource manager.

A Warewulf-configured cluster head node with bootable, stateless compute nodes [1] is a first step in building a cluster. Although you can run jobs at this point, some additions need to be made to make it more functional. In this article, I'll show you how to configure time so that the head node and the compute node are in sync. This step is more important than some people realize. For example, I have seen Message Passing Interface (MPI) applications that have failed because the clocks on two of the nodes were far out of sync.

Next, you will want to install a resource manager (job scheduler) that allows you to queue up jobs, so you don't have to sit at the terminal waiting for jobs to finish. In this example, I use Slurm. You can also share the cluster with other users and have jobs run when the resources you need are available. This component is a key to creating an HPC cluster.

NTP

One of the absolute key tools for clusters is the Network Time Protocol (NTP), which syncs the system clocks either to each other, or to a standard atomic clock (or close to it), or both. With clocks in sync, the many tools and libraries on clusters such as MPI will function correctly.

On Rocky Linux 8, I use chrony [2] to sync clocks on both the client and the server. In the case of the cluster, the head node is a client to the outside world, but it will also act as a time server to the compute nodes within the cluster.

Installing chrony on the head node with yum or dnf is easy. During installation, a default /etc/chrony.conf configuration file is created, but I modified mine to keep it really simple:

server 2.rocky.pool.ntp.org
driftfile /var/lib/chrony/drift
makestep 1.0 3 rtcsync allow 10.0.0.0/8 local stratum 10 keyfile /etc/chrony.keys
leapsectz right/UTC
logdir /var/log/chrony

I pointed the head node to 2.rocky.pool.ntp.org as the source of time updates in the outside world (this came with the default). I also allowed the head node to be used by the IP addresses in the range 10.0.0.0/8.

After you edit the file, you should restart the chrony service:

$ sudo systemctl restart chronyd

I also like to make sure it will start automatically on boot, so I run:

$ sudo systemctl enable chronyd

Although it is probably not necessary, because it was enabled when installed, I like to be sure. I usually also check that it's running with systemctl when I restart the head node.

At this point you can test whether the clock is synchronized by installing the ntpstat utility on the head node and then running it:

$ sudo yum install ntpstat
$ ntpstat
synchronised to NTP server (162.159.200.123) at stratum 4
   time correct to within 21 ms
   polling server every 64 s

Your output will not match this exactly, but you can see that it's using an outside source to synchronize the clock.

Configuring time on the compute node is a bit different from the head node, requiring a few more steps. The first difference is that the compute node has no time zone associated with it, and I like to keep the compute nodes as close as possible to the head node. If you try to set a time zone in the container, it won't work because the container is not running. You can either set the time zone manually on the compute node before running any jobs, or you can create a simple systemd script that runs on startup. I'm going to choose the second approach to automate things.

To create a simple script that is run by the system when the node starts but after the network is up, you should put the script where changes local to the node should reside: in /usr/local/bin. Begin by exec-ing into the container:

$ sudo wwctl container exec rocky-8 /bin/bash
[rocky-8] Warewulf>

Next, create a script in /usr/local/bin/ (I named mine timezone_fix.sh).

#!/bin/bash
timedatectl set-timezone America/New_York

Adjust the time zone value for your cluster. (You can find the time zone of your head node with the command timedatectl.) In my case, it is America/New_York. The command timedatectl set-timezone sets the time zone. Be sure to make the script executable:

[rocky-8] Warewulf> chmod u+x /usr/local/bin/timezone_fix.sh

After creating that script, create the systemd service that runs it in file /etc/systemd/system/run-at-startup.service, so the system knows about the script:

[Unit]
Description=Script to set time zone to EDT
[Service]
Type=simple
RemainAfterExit=no
ExecStart=/usr/local/bin/timezone_fix.sh
TimeoutStartSec=0
[Install]
WantedBy=default.target

The final step is to enable the run-at-startup service:

[rocky-8] Warewulf> systemctl enable run-at-startup.service
Created symlink /etc/systemd/system/default.target.wants/run-at-startup.service /etc/systemd/system/run-at-startup.service

With these additions, the time zone in the compute node will match the head node. Again, I don't think it's strictly required, but I like to have it. Now you can install chrony into the compute node container, as you did for the head node:

$ yum install chrony ntpstat

The /etc/chrony.conf file for compute node is similar to the head node:

server 10.0.0.1
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
allow 10.0.0.0/8
local stratum 10
keyfile /etc/chrony.keys
leapsectz right/UTC
logdir /var/log/chrony

For the compute node, I just point chrony to the head node (warewulf or IP 10.0.0.1) as the source of time (time server) because it is pointed to an outside NTP server. Strictly speaking, I think you really only need the head node and compute nodes to be in sync, but you might as well sync the head node to the true time outside of the cluster.

To make sure chrony starts when the container boots, I enable the service inside the container,

$ systemctl enable chronyd

and type exit to leave the container. Be sure that it rebuilds the container when you exit; otherwise, you have to go back and redo everything. To make sure NTP is working, boot the compute node and run timedatectl (Listing 1).

Listing 1

Running timedatectl

$ ssh n0001
[laytonjb@n0001 ~]$ timedatectl
               Local time: Sat 2022-12-17 11:31:26 EST
           Universal time: Sat 2022-12-17 16:31:26 UTC
                 RTC time: Sat 2022-12-17 16:31:26
                Time zone: America/New_York (EST, -0500)
System clock synchronized: yes
              NTP service: active
          RTC in local TZ: no

Everything looks good at this point. The time zone matches the head node, and the node is time syncing. Another thing to look at is the ntpstat output:

$ ntpstat
synchronised to NTP server (10.0.0.1) at stratum 5
   time correct to within 48 ms
   polling server every 64 s

The NTP server is correct and all looks good. On to the next step!

Slurm

Now that time is synchronized between the head node and compute nodes, I like to install the resource manager (aka, the job scheduler). I chose Slurm for my cluster because it is so ubiquitous, but you have several others from which to choose.

I must admit that I had a difficult time getting Slurm to run by my installation method. Although my method could be the problem, perhaps not. (I'm sure it was my fault, though.) Regardless, with the help of several people on the mailing lists, I got it running.

The process I followed is in a recipe by Steve Jones from Stanford University. He has a nice recipe for his system that creates nodes with virtual machines (VMs) [3] that can be used as a template for physical nodes. I didn't use the entire recipe, only those parts near the end that applied to installing and configuring Slurm.

His recipe uses the Slurm RPMs from OpenHPC [4], which I like using for several reasons: They are cluster oriented; OpenHPC will be switching to Warewulf 4 soon, so they have preliminary binaries; and I didn't have to build Slurm from scratch. The first step in using these RPMs is to add the OpenHPC repository. After a little hunting I found the release file and installed it on the head node:

$ sudo yum install http://repos.openhpc.community/OpenHPC/2/EL_8/x86_64/ohpc-release-2-1.el8.x86_64.rpm

After installing the release RPM, I installed munge, the Slurm authentication tool (Listing 2), then I installed the Slurm server on the head node (Listing 3).

Listing 2

Installing munge

$ sudo yum install munge
Last metadata expiration check: 0:03:14 ago on Sun 04 Dec 2022 08:29:36 AM EST.
Dependencies resolved.
================================================================================================
 Package                Architecture       Version                  Repository             Size
================================================================================================
Installing:
 munge                  x86_64             0.5.13-2.el8             appstream             121 k
Installing dependencies:
 munge-libs             x86_64             0.5.13-2.el8             appstream              29 k
Transaction Summary
================================================================================================
Install  2 Packages
...

Listing 3

Install Slurm Server

$ sudo yum install ohpc-slurm-server
Last metadata expiration check: 0:22:28 ago on Sun 04 Dec 2022 08:29:36 AM EST.
Dependencies resolved.
===============================================================================================
 Package                        Arch       Version                   Repository           Size
===============================================================================================
Installing:
 ohpc-slurm-server              x86_64     2.6-7.1.ohpc.2.6          OpenHPC-updates     7.0 k
Installing dependencies:
 mariadb-connector-c            x86_64     3.1.11-2.el8_3            appstream           199 k
 mariadb-connector-c-config     noarch     3.1.11-2.el8_3            appstream            14 k
 ohpc-filesystem                noarch     2.6-2.3.ohpc.2.6          OpenHPC-updates     8.0 k
 pdsh-mod-slurm-ohpc            x86_64     2.34-9.1.ohpc.2.6         OpenHPC-updates      13 k
 slurm-devel-ohpc               x86_64     22.05.2-14.1.ohpc.2.6     OpenHPC-updates      83 k
 slurm-example-configs-ohpc     x86_64     22.05.2-14.1.ohpc.2.6     OpenHPC-updates     242 k
 slurm-ohpc                     x86_64     22.05.2-14.1.ohpc.2.6     OpenHPC-updates      18 M
 slurm-perlapi-ohpc             x86_64     22.05.2-14.1.ohpc.2.6     OpenHPC-updates     822 k
 slurm-slurmctld-ohpc           x86_64     22.05.2-14.1.ohpc.2.6     OpenHPC-updates     1.5 M
 slurm-slurmdbd-ohpc            x86_64     22.05.2-14.1.ohpc.2.6     OpenHPC-updates     836 k
Transaction Summary
===============================================================================================
Install  11 Packages
...

Everything should go fine through this step. If you have hiccups, I recommend posting to the slurm-users mailing list, the warewulf mailing list, or both.

Next, you need to create and edit the slurm.conf file. Some files in /etc/slurm/ are part of the Slurm server installation. You will use these templated files later.

For now, use the slurm.conf.example template file:

$ sudo cp /etc/slurm/slurm.conf.ohpc /etc/slurm/slurm.conf

Jones's recipe uses some Perl commands to edit that file which is used on the head node (the Slurm server) (Listing 4). These commands are fairly easy to understand, even if you don't know Perl. On the second and third lines I changed the name of the compute node to match my node (n0001).

Listing 4

Edit the Template File

$ sudo perl -pi -e "s/ControlMachine=\S+/ControlMachine=`hostname -s`/" /etc/slurm/slurm.conf
$ sudo perl -pi -e "s/^NodeName=(\S+)/NodeName=n0001/" /etc/slurm/slurm.conf
$ sudo perl -pi -e "s/^PartitionName=normal Nodes=(\S+)/PartitionName=normal Nodes=n0001/" /etc/slurm/slurm.conf
$ sudo perl -pi -e "s/ Nodes=c\S+ / Nodes=ALL /" /etc/slurm/slurm.conf
$ sudo perl -pi -e "s/ReturnToService=1/ReturnToService=2/" /etc/slurm/slurm.conf

You should also set the munge and slurmctld services to start when the head node boots:

$ sudo systemctl enable --now munge
$ sudo systemctl enable --now slurmctld

A few more modifications need to be made on the Slurm head node. Edit the lines in the /etc/slurm/slurm.conf file as follows:

...
SlurmctldAddr=10.0.0.1
...
SlurmctldLogFile=/var/log/slurm/slurmctld.log
...
SlurmLogFile=/var/log/slurm/slurmd.log
...

The first line points to the head node's IP address. The other two lines tell Slurm where to write the logs. If the log directory doesn't exist, you will have to create it and chown it to slurm:slurm:

$ sudo mkdir /var/log/slurm
$ sudo chown slurm:slurm /var/log/slurm

Another edit you will have to make in /etc/slurm/slurm.conf is to the line that begins NodeName=. It should reflect the node name, the number of sockets, the number of cores per socket, and the number of threads per core. For me, this line is

NodeName=n0001 Sockets=1 CoresPerSocket=4 ThreadsPerCore=2 State=UNKNOWN

for my compute node because I have a single socket with a four-core processor that has hyperthreading turned on (two threads per core). You should change this line to reflect your compute node.

It's also good to check that the directory /var/lib/munge exists (it should). The owner should be munge:munge on the directory and any files in there. If the directory doesn't exist, that is a problem and you need to create it, chown it to munge:munge, and reinstall the OpenHPC Slurm server RPM. If you see the file /var/lib/munge/munge.seed and it is owned by munge:munge, you should be good.

Also check that the directory /var/spool/slurmctld exists and is owned by slurm:slurm. A number of files in that directory should also be owned by slurm:slurm. If you don't see the directory or the files, create the directory, chown it to slurm:slurm, and reinstall the OpenHPC Slurm server RPM.

The next thing to check is for the existence of the file /etc/slurm/cgroup.conf. If the file doesn't exist, there might be a file named cgroup.conf.example in that same directory. If so, copy it to cgroup.conf:

$ sudo cp /etc/slurm/cgroup.conf.example /etc/slurm/cgroup.conf

Now add a single line to the end of /etc/slurm/cgroup.conf so the file looks like Listing 5. This last line is what stumped me for a while until Jason Stover helped (thanks Jason!).

Listing 5

Modification for /etc/slurm/cgroup.conf

$ sudo more /etc/slurm/cgroup.conf
###
#
# Slurm cgroup support configuration file
#
# See man slurm.conf and man cgroup.conf for further
# information on cgroup configuration parameters
#--
CgroupAutomount=yes
ConstrainCores=no
ConstrainRAMSpace=no
CgroupMountpoint=/sys/fs/cgroup

I realize this seems like quite a bit of fiddling, but this is what I had to do to get Slurm to work, and it's not bad because it only needs to be done once. You can choose to build and install Slurm yourself or use different RPMs.

Now I come to the fun part, the compute node, which is a bit different from the head node and requires maybe a little more fiddling around; in reality, you only have to do this once per container. You can even script this if you like, especially if you are going to use several containers.

The first step is to create the users slurm and munge in the container along with their groups before installing anything. This part is very critical: the user ID (UID) and group ID (GID) of the slurm and munge users and groups in the container must match those on the head node. On my head node, the group entries for slurm and munge are:

munge:x:970:
slurm:x:202:

The entries for the corresponding users are:

munge:x:972:970:Runs Uid 'N' Gid Emporium:/var/run/munge:/sbin/nologin
slurm:x:202:202:SLURM resource manager:/etc/slurm:/sbin/nologin

Write down the GIDs and names and UIDs and names, and then exec into the container and mount the host filesystem in the container. Once in the container, you can create the appropriate groups and users (Listing 6).

Listing 6

Mount the Host Filesystem

$ sudo wwctl container exec --bind /:/mnt rocky-8 /bin/bash
[rocky-8] Warewulf> groupadd -g 970 munge
[rocky-8] Warewulf> groupadd -g 202 slurm
[rocky-8] Warewulf> useradd -g 970 -u 972 munge
[rocky-8] Warewulf> useradd -g 202 -u 202 slurm

Be sure to check these against the head node just to be sure. This is a very important step. Don't exit from the container just yet. You need to install the OpenHPC release RPM to use their RPMs:

[rocky-8] Warewulf> yum install http://repos.openhpc.community/OpenHPC/2/EL_8/x86_64/ohpc-release-2-1.el8.x86_64.rpm

Next you can install the OpenHPC Slurm client (Listing 7). Like the head node, you now need to fix up the Slurm client installation.

Listing 7

Install OpenHPC Slurm Client

[rocky-8] Warewulf> yum install ohpc-slurm-client
Failed to set locale, defaulting to C.UTF-8
Last metadata expiration check: 0:04:01 ago on Sun Dec 11 14:26:55 2022.
Dependencies resolved.
===============================================================================================
 Package                    Arch   Version                               Repository       Size
===============================================================================================
Installing:
 ohpc-slurm-client          x86_64 2.6-7.1.ohpc.2.6                      OpenHPC-updates 6.9 k
Installing dependencies:
 cairo                      x86_64 1.15.12-6.el8                         appstream       718 k
 dejavu-fonts-common        noarch 2.35-7.el8                            baseos           73 k
 dejavu-sans-fonts          noarch 2.35-7.el8                            baseos          1.5 M
 fontconfig                 x86_64 2.13.1-4.el8                          baseos          273 k
 fontpackages-filesystem    noarch 1.44-22.el8                           baseos           15 k
 freetype                   x86_64 2.9.1-9.el8                           baseos          393 k
 groff-base                 x86_64 1.22.3-18.el8                         baseos          1.0 M
 hwloc-ohpc                 x86_64 2.7.0-3.9.ohpc.2.6                    OpenHPC-updates 2.6 M
 libX11                     x86_64 1.6.8-5.el8                           appstream       610 k
 libX11-common              noarch 1.6.8-5.el8                           appstream       157 k
 libXau                     x86_64 1.0.9-3.el8                           appstream        36 k
 libXext                    x86_64 1.3.4-1.el8                           appstream        44 k
 libXrender                 x86_64 0.9.10-7.el8                          appstream        32 k
 libpng                     x86_64 2:1.6.34-5.el8                        baseos          125 k
 libxcb                     x86_64 1.13.1-1.el8                          appstream       228 k
 mariadb-connector-c        x86_64 3.1.11-2.el8_3                        appstream       199 k
 mariadb-connector-c-config noarch 3.1.11-2.el8_3                        appstream        14 k
 munge                      x86_64 0.5.13-2.el8                          appstream       121 k
 munge-libs                 x86_64 0.5.13-2.el8                          appstream        29 k
 numactl-libs               x86_64 2.0.12-13.el8                         baseos           35 k
 ohpc-filesystem            noarch 2.6-2.3.ohpc.2.6                      OpenHPC-updates 8.0 k
 perl-Carp                  noarch 1.42-396.el8                          baseos           29 k
 perl-Data-Dumper           x86_64 2.167-399.el8                         baseos           57 k
 perl-Digest                noarch 1.17-395.el8                          appstream        26 k
 perl-Digest-MD5            x86_64 2.55-396.el8                          appstream        36 k
 perl-Encode                x86_64 4:2.97-3.el8                          baseos          1.5 M
 perl-Errno                 x86_64 1.28-421.el8                          baseos           75 k
 perl-Exporter              noarch 5.72-396.el8                          baseos           33 k
 perl-File-Path             noarch 2.15-2.el8                            baseos           37 k
 perl-File-Temp             noarch 0.230.600-1.el8                       baseos           62 k
 perl-Getopt-Long           noarch 1:2.50-4.el8                          baseos           62 k
 perl-HTTP-Tiny             noarch 0.074-1.el8                           baseos           57 k
 perl-IO                    x86_64 1.38-421.el8                          baseos          141 k
 perl-MIME-Base64           x86_64 3.15-396.el8                          baseos           30 k
 perl-Net-SSLeay            x86_64 1.88-2.module+el8.6.0+957+15d660ad    appstream       378 k
 perl-PathTools             x86_64 3.74-1.el8                            baseos           89 k
 perl-Pod-Escapes           noarch 1:1.07-395.el8                        baseos           19 k
 perl-Pod-Perldoc           noarch 3.28-396.el8                          baseos           85 k
 perl-Pod-Simple            noarch 1:3.35-395.el8                        baseos          212 k
 perl-Pod-Usage             noarch 4:1.69-395.el8                        baseos           33 k
 perl-Scalar-List-Utils     x86_64 3:1.49-2.el8                          baseos           67 k
 perl-Socket                x86_64 4:2.027-3.el8                         baseos           58 k
 perl-Storable              x86_64 1:3.11-3.el8                          baseos           97 k
 perl-Term-ANSIColor        noarch 4.06-396.el8                          baseos           45 k
 perl-Term-Cap              noarch 1.17-395.el8                          baseos           22 k
 perl-Text-ParseWords       noarch 3.30-395.el8                          baseos           17 k
 perl-Text-Tabs+Wrap        noarch 2013.0523-395.el8                     baseos           23 k
 perl-Time-Local            noarch 1:1.280-1.el8                         baseos           32 k
 perl-URI                   noarch 1.73-3.el8                            appstream       115 k
 perl-Unicode-Normalize     x86_64 1.25-396.el8                          baseos           81 k
 perl-constant              noarch 1.33-396.el8                          baseos           24 k
 perl-interpreter           x86_64 4:5.26.3-421.el8                      baseos          6.3 M
 perl-libnet                noarch 3.11-3.el8                            appstream       120 k
 perl-libs                  x86_64 4:5.26.3-421.el8                      baseos          1.6 M
 perl-macros                x86_64 4:5.26.3-421.el8                      baseos           71 k
 perl-parent                noarch 1:0.237-1.el8                         baseos           19 k
 perl-podlators             noarch 4.11-1.el8                            baseos          117 k
 perl-threads               x86_64 1:2.21-2.el8                          baseos           60 k
 perl-threads-shared        x86_64 1.58-2.el8                            baseos           47 k
 pixman                     x86_64 0.38.4-2.el8                          appstream       256 k
 slurm-contribs-ohpc        x86_64 22.05.2-14.1.ohpc.2.6                 OpenHPC-updates  22 k
 slurm-example-configs-ohpc x86_64 22.05.2-14.1.ohpc.2.6                 OpenHPC-updates 242 k
 slurm-ohpc                 x86_64 22.05.2-14.1.ohpc.2.6                 OpenHPC-updates  18 M
 slurm-pam_slurm-ohpc       x86_64 22.05.2-14.1.ohpc.2.6                 OpenHPC-updates 172 k
 slurm-slurmd-ohpc          x86_64 22.05.2-14.1.ohpc.2.6                 OpenHPC-updates 767 k
Installing weak dependencies:
 perl-IO-Socket-IP          noarch 0.39-5.el8                            appstream        46 k
 perl-IO-Socket-SSL         noarch 2.066-4.module+el8.6.0+957+15d660ad   appstream       297 k
 perl-Mozilla-CA            noarch 20160104-7.module+el8.6.0+965+850557f9
                                                                         appstream        14 k
Transaction Summary
===============================================================================================
Install  69 Packages
Total download size: 40 M
Installed size: 144 M
Is this ok [y/N]:
Downloading Packages:
...

To begin, edit the munge configuration. In no specific order, begin by checking that the directory /etc/munge exists. If not, create it and chown it to munge:munge:

[rocky-8] Warewulf> mkdir /etc/munge
[rocky-8] Warewulf> chown munge:munge /etc/munge

Copy the file /etc/munge/munge.key from the head node to the container, which won't be difficult because you mounted the head node filesystem when you exec'd into the container:

[rocky-8] Warewulf> cp /mnt/etc/munge/munge.key /etc/munge/munge.key
cp: overwrite '/etc/munge/munge.key'? y
[rocky-8] Warewulf> ls -lstar /etc/munge/munge.key
4 -r-------- 1 munge munge 1024 Dec 11 14:46 /etc/munge/munge.key

If it asks you whether you want to overwrite the existing munge.key, choose y. Be sure the directory and the file are all owned by munge:munge (UID:GID).

Next, you need to chown the directory /var/lib/munge to munge:munge:

[rocky-8] Warewulf> chown munge:munge /var/lib/munge

Now turn your attention to configuring Slurm in the container by creating the directory /var/spool/slurmd and chown it to slurm:slurm:

[rocky-8] Warewulf> mkdir /var/spool/slurmd
[rocky-8] Warewulf> chown slurm:slurm /var/spool/slurmd

Notice that the directory for the client is slurmd and not slurmctld, which is for the Slurm server.

Next, copy the slurm.conf file from the host node:

[rocky-8] Warewulf> cp /mnt/etc/slurm/slurm.conf /etc/slurm/slurm.conf

You shouldn't have to change anything in the file /etc/slurm/slurm.conf once it is in the container.

Next, create the log directory for Slurm in the container and chown it to slurm:slurm:

[rocky-8] Warewulf> mkdir /var/log/slurm
[rocky-8] Warewulf> chown slurm:slurm /var/log/slurm

One other thing you should use from Jones' recipe is the following command inside the container that sets options for starting Slurm:

echo SLURMD_OPTIONS="--conf-server `hostname -s`" > /etc/sysconfig/slurmd

The result can be incorrect. To be sure, edit the file /etc/sysconfig/slurmd. The file for my setup, where the Slurm server head node is named warewulf, should be:

SLURMD_OPTIONS=--conf-server warewulf

Be sure it points to the Slurm server.

Finally, enable the services in the container for munge and slurm:

systemctl enable munge
systemctl enable slurmd

Note that slurmd is now correct and is not slurmctld as on the head node.

At this point, you can exit the container, which should save, before starting or restarting the compute node.

Once the compute node is rebooted, either SSH into it or log in directly. Check that the munge and slurmd services are running, then check the munge and slurmd details, particularly the permissions for:

  • /var/munge
  • /var/munge/munge.key
  • /etc/slurm
  • /etc/slurm/slurm.conf
  • /var/log/slurm
  • /var/spool/slurm
  • /etc/sysconfig/slurmd

If you see any issues with permissions, check /etc/group and /etc/passwd on the compute node and compare them with the head node. To correct differences, edit these two files in the container on the host node, exit the container so the updates are saved, and reboot the compute node.

If everything appears correct, go to the head node and run the command shown in Listing 8, which lists the nodes known by slurm. The node is idle, so it is ready to run jobs.

Listing 8

List of Nodes Known by slurm

$ sinfo -a
PARTITION AVAIL     TIMELIMIT      NODES  STATE NODELIST
    normal*        up 1-00:00:00     1      idle n0001

The first job I like to run is a single line that checks the hostnames of all nodes:

$ srun -n1 -l hostname
[output]

If this works, it is time to move on to something more sophisticated and more like running an HPC job. I'll create a simple script that does nothing but sleep for 40 seconds:

#!/bin/bash
date
sleep 40
hostname -s
date

I called this script stall.sh because it does nothing but "stall" until the script finishes. You can think of this as the application script.

The second script contains the slurm options and srun script:

#!/bin/bash
#SBATCH --job=test_stall_job
#SBATCH --nodes=1
#SBATCH --output=test_stall_%j.log
srun /home/laytonjb/stall.sh

I don't want to go into too much detail, but the first line that begins SBATCH defines the job name, the second line defines how many nodes to use (just one), and the third SBATCH defines the name of the file where the output is written. I called this file run_stall.sh.

When I submit the job to Slurm, Listing 9 shows the job queue, and the nodes known by slurm.

Listing 9

Job Queue and Known Nodes

$ sbatch run_stall.sh
Submitted batch job 20
[laytonjb@warewulf ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                20    normal test_sta laytonjb  R       0:01      1 n0001
[laytonjb@warewulf ~]$ sinfo -a
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
normal*      up 1-00:00:00      1  alloc n0001

The batch job id (20) is printed to the command line after running the sbatch command. The output of squeue shows the job id, the slurm partition (called normal), the user who submitted the job, the status (ST), the time it's been running, and a list of the nodes that are being used under the label NODELIST.

When you run the sinfo -a command, it shows that the node is allocated (alloc), which means it is being allocated to a job.

Summary

Building on the last article [1], NTP was added to the Warewulf 4 cluster for precise time keeping, which is critical for all subsequent steps in building the cluster. The time zone to the compute node (container) was added, as well, to match the head node. I'm not entirely sure if this is necessary, but it is something I like to do to make sure all the nodes have the same time zone.

The last thing added in this article was a resource manager (job scheduler) – in this case, Slurm. The installation approach I chose may be a bit fiddly, but it gets the job done. If you have another method that works for you, please use it.

On the head node you must configure the /etc/slurm/slurm.conf file, for which Jones' recipe has the commands. You also need to configure /etc/slurm/cgroup.conf by adding the one line at the end.

Moving to the compute node, the key to getting Slurm to work is to make sure the compute node, really the container, has the same GIDs and UIDs as the head node. It is also recommended you do this before you install the Slurm client into the container. Once this is done, installation is fairly straightforward.

Next, exec into the container, mounting the head node filesystem. From there, you will likely have to make a few directories and chown them to either munge:munge or slurm:slurm. Then, copy over /etc/slurm/slurm.conf and /etc/munge/munge.key from the head node to the container. Be sure these are owned by the correct UID:GID.

Remember that the compute node runs slurmd, whereas the head node runs slurmctld. Once this is done, you can start the compute node and run test jobs.

Other additions to your cluster could include installing environment modules with compilers and libraries or setting up GPUs on the head and compute nodes and configuring them in Slurm as a consumable resource [5].

The Author

Jeff Layton has been in the HPC business for over 30 years (starting when he was 4 years old). When he's not grappling with a stubborn systemd script, he's looking for deals for his home cluster. His twitter handle is http://@JeffdotLayton.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Warewulf 4 – Time and Resource Management

    Warewulf installed with a compute node is not really an HPC cluster; you need to ensure precise time keeping and add a resource manager.

  • Resource Management with Slurm

    One way to share HPC systems among several users is to use a software tool called a resource manager. Slurm, probably the most common job scheduler in use today, is open source, scalable, and easy to install and customize.

  • Resource Management with Slurm
    One way to share HPC systems among several users is to use a software tool called a resource manager. Slurm, probably the most common job scheduler in use today, is open source, scalable, and easy to install and customize.
comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=