![Photo by Andrew Ly on Unsplash Photo by Andrew Ly on Unsplash](/var/ezflow_site/storage/images/archive/2023/74/building-a-hpc-cluster-with-warewulf-4/photobyandrewlyonunsplash_wolf.png/203484-1-eng-US/PhotobyAndrewLyonUnsplash_Wolf.png_medium.png)
Photo by Andrew Ly on Unsplash
Building a HPC cluster with Warewulf 4
Time and Resource Management
A Warewulf-configured cluster head node with bootable, stateless compute nodes [1] is a first step in building a cluster. Although you can run jobs at this point, some additions need to be made to make it more functional. In this article, I'll show you how to configure time so that the head node and the compute node are in sync. This step is more important than some people realize. For example, I have seen Message Passing Interface (MPI) applications that have failed because the clocks on two of the nodes were far out of sync.
Next, you will want to install a resource manager (job scheduler) that allows you to queue up jobs, so you don't have to sit at the terminal waiting for jobs to finish. In this example, I use Slurm. You can also share the cluster with other users and have jobs run when the resources you need are available. This component is a key to creating an HPC cluster.
NTP
One of the absolute key tools for clusters is the Network Time Protocol (NTP), which syncs the system clocks either to each other, or to a standard atomic clock (or close to it), or both. With clocks in sync, the many tools and libraries on clusters such as MPI will function correctly.
On Rocky Linux 8, I use chrony
[2] to sync clocks on both the client and the server. In the case of the cluster, the head node is a client to the outside world, but it will also act as a time server to the compute nodes within the cluster.
Installing chrony
on the head node with yum
or dnf
is easy. During installation, a default /etc/chrony.conf
configuration file is created, but I modified mine to keep it really simple:
server 2.rocky.pool.ntp.org driftfile /var/lib/chrony/drift makestep 1.0 3 rtcsync allow 10.0.0.0/8 local...
Buy this article as PDF
(incl. VAT)