Photo by Andrew Ly on Unsplash

Photo by Andrew Ly on Unsplash

Building a HPC cluster with Warewulf 4

Time and Resource Management

Article from ADMIN 74/2023
By
Warewulf installed with a compute node is not really an HPC cluster; you need to ensure precise time keeping and add a resource manager.

A Warewulf-configured cluster head node with bootable, stateless compute nodes [1] is a first step in building a cluster. Although you can run jobs at this point, some additions need to be made to make it more functional. In this article, I'll show you how to configure time so that the head node and the compute node are in sync. This step is more important than some people realize. For example, I have seen Message Passing Interface (MPI) applications that have failed because the clocks on two of the nodes were far out of sync.

Next, you will want to install a resource manager (job scheduler) that allows you to queue up jobs, so you don't have to sit at the terminal waiting for jobs to finish. In this example, I use Slurm. You can also share the cluster with other users and have jobs run when the resources you need are available. This component is a key to creating an HPC cluster.

NTP

One of the absolute key tools for clusters is the Network Time Protocol (NTP), which syncs the system clocks either to each other, or to a standard atomic clock (or close to it), or both. With clocks in sync, the many tools and libraries on clusters such as MPI will function correctly.

On Rocky Linux 8, I use chrony [2] to sync clocks on both the client and the server. In the case of the cluster, the head node is a client to the outside world, but it will also act as a time server to the compute nodes within the cluster.

Installing chrony on the head node with yum or dnf is easy. During installation, a default /etc/chrony.conf configuration file is created, but I modified mine to keep it really simple:

server 2.rocky.pool.ntp.org
driftfile /var/lib/chrony/drift
makestep 1.0 3 rtcsync allow 10.0.0.0/8 local
...
Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Warewulf 4 – Time and Resource Management

    Warewulf installed with a compute node is not really an HPC cluster; you need to ensure precise time keeping and add a resource manager.

  • Resource Management with Slurm

    One way to share HPC systems among several users is to use a software tool called a resource manager. Slurm, probably the most common job scheduler in use today, is open source, scalable, and easy to install and customize.

  • Resource Management with Slurm
    One way to share HPC systems among several users is to use a software tool called a resource manager. Slurm, probably the most common job scheduler in use today, is open source, scalable, and easy to install and customize.
comments powered by Disqus