« Previous 1 2 3 4
Resource Management with Slurm
Slurm Job Scheduling System
squeue
To print a list of jobs in the job queue or for a particular user, use squeue
. For example,
$ squeue -u akitzmiller
lists the jobs for a particular user.
sacct
The sacct
command displays the accounting data for all jobs and job steps in the Slurm job accounting log or Slurm database, and you can run the command against a specific job number:
$ sacct -j 999999
Summary
A resource manager is one of the most critical pieces of software in HPC. It allows systems and their resources to be shared efficiently, and it is remarkably flexible, allowing the creation of multiple queues according to resource types or generic resources (e.g., GPUs in this article). Slurm also has job accounting by default.
The Slurm resource manager is one of the most common job schedulers in use today for very good reasons, some of which I covered here. Prepare to be "Slurmed."
Infos
- "pdsh Parallel Shell" by Jeff Layton: http://www.admin-magazine.com/HPC/Articles/pdsh-Parallel-Shell
- "Environment Modules Using Lmod" by Jeff Layton: http://www.admin-magazine.com/HPC/Articles/Environment-Modules-Using-Lmod
- "Shared storage with NFS and SSHFS" by Jeff Layton: http://www.admin-magazine.com/HPC/Articles/Shared-Storage-with-NFS-and-SSHFS
- Slurm: https://slurm.schedmd.com/
- SchedMD: https://www.schedmd.com/
- Groupe Bull: https://atos.net/en/products
- Slurm's three functions: https://slurm.schedmd.com/overview.html
- Installing Slurm binaries on Ubuntu 16.04: https://github.com/mknoxnv/ubuntu-slurm
- MUNGE: https://dun.github.io/munge/
« Previous 1 2 3 4
Buy this article as PDF
(incl. VAT)