Environment Modules Using Lmod
One of the key tools for any cluster is environment modules, which allow you to define your user environment and the set of tools want to build and execute your application. They provide a simple way to control dynamically the huge number of combinations that result from various versions of tools and libraries.
The original environment module, Tcl (tool command language)/C, has been around since the early 1990s. High-performance computing ( HPC) sites have been using them to allow users to specify the combination of tools and libraries they want to use.
One implementation of environment modules, Lmod, is under constant development and has some unique features that can be very useful and can even be useful on your desktop if you write code and want to use a variety of tools and libraries. I use Lmod on my desktop and laptop to try new compilers or new versions of compilers, as well as new versions of libraries.
Fundamentals of Environment Modules
Programmers use a number of compilers; MPI, compute, and other libraries; and various tools to write applications. For example, someone might code with OpenACC to target GPUs and Fortran for PGI compilers, along with Open MPI, whereas another person might use the GNU 8.1 compilers with MPICH. One user might use the Portable, Extensible Toolkit for Scientific Computation (PETSc) to solve their problem, and another user might use OpenBLAS.
Environment module tools allow users and developers to specify the exact set of tools they want or need and is key to operating an effective HPC system. “Effective” can mean better performance (tools that allow your code to run as fast as possible), more flexibility (tools that match a specific case), or ease of environment configuration.
To explain a little deeper, assume you have five versions of the GNU compiler (4.8, 5.4, 6.2, 7.3, and 8.1), the latest Intel compiler, the last three community versions of the PGI compilers, three versions of MPICH (3.2.1, 3.1.4, and 3.1), and three versions of Open MPI (2.1, 3.0, and 3.1). Furthermore, assume your users need two versions of PETSc and two versions of OpenBLAS. Altogether you have 216 possible combinations (nine compilers, six MPI libraries, two PETSc versions, and two OpenBLAS versions). This is a very large number of combinations, and forcing users to configure their environment, primarily paths, to use the combination they want or need is bordering on HPC torture. At this point, you have two choices: You can reduce the number of combinations (e.g., get rid of some of the compilers and MPI libraries), or you can use environment modules so users can choose the combination of compiler, MPI library, and computational libraries they want or need.
As a user, I might build a parallel application using GNU 4.8, the default compiler for CentOS 7.3, and MPICH; however, the GNU 6.2 compilers have some unique features, so I might want to try building the same application with it. Environment modules allow me to select the tools used for production and allow me to use a different tool set for development.
The secret to environment modules is manipulating the environment variables (e.g., $PATH , $LD_LIBRARY_PATH , and $MANPATH ) and make changes to these variables according to the tool combinations desired. Changing the tool set changes these environment modules accordingly. It is fairly simple conceptually and just takes a bit of coordination.
You can use either the “classic” Environment Modules Tcl/C or Lmod; they both provide the same basic functionality. For the purposes of this article, I will be using Lmod, because it has some features I tend to use fairly often.
Lmod
Lmod is an environment module tool that provides simple commands for manipulating your environment for tool and library selections. A set of “modules,” which are really just text files, are written to modify $PATH , $LD_LIBRARY_PATH , and $MANPATH and possibly create other needed environment variables for the specific tool or library. A great feature of Lmod is that you can write these modules to define dependencies, as well, so that you do not use conflicting tools. By “loading” or “unloading” these modules, you can change your environment to use what you need. If you “purge” all of your modules, they are all unloaded and your $PATH , $LD_LIBRARY_PATH , and $MANPATH are returned to the values they had when you logged in to the system.
Lmod provide a complete set of tools for using and manipulating these module files. For example, you can:
- list available modules (only the modules compatible with the currently loaded modules appear) – module avail
- load and unload modules – module load [x] and module unload [x]
- purge all modules – module purge
- swap modules – module swap
- list loaded modules – module list loaded
- query modules
- ask for help on modules
- show modules
You can also perform many other related tasks with other commands that are not listed here (and not commonly used).
One of the coolest Lmod features is its ability to handle a module hierarchy, so it will only display or list available modules that are dependent on the currently loaded modules, preventing you from loading incompatible modules. This feature can help reduce unusual errors with mismatched modules that are sometimes very difficult to diagnose and is accomplished through a “module hierarchy” that I will explain in a later section because it is a very important and useful Lmod feature.
The first widely used environment module tool was Environment Modules Tcl/C, so called because the code is written primarily in C, with the module files in Tcl. Lmod is a different implementation of environment modules and uses module files written in Lua, a popular language in its own right and an embeddable language for applications. Lmod also retains the ability to read and use modules written in Tcl.
Module files are easy to create with the use of just a few functions for manipulating $PATH , $LD_LIBRARY_PATH , and $MANPATH values for the targeted tool, application, or library. Other environment variables also can be created easily in the module file as they are needed. The module file contains the dependencies, so Lmod can track that information to screen out incompatible modules from being listed or loaded. It is also highly recommended that you include help information in the module along with lots of documentation, such as what version of the tool the module file covers and when the module file was last modified (check the dates on the file to see if they match the modified date in the files).
These module files are placed in a directory hierarchy (tree) in a specific location in the filesystem. Ideally, this filesystem is shared with the other nodes in the cluster; otherwise, you have to copy the module files in the specific hierarchy to all compute nodes and maintain them – including making sure they are all identical.
In the next section, I explain Lmod hierarchical modules, focusing on how to organize module files and how to limit the visibility of dependent module files. I will use an example from my own laptop to help illustrate this process. Additionally, I have tried to add comments about Lmod best practices, some of which I have gathered from email discussions with Robert McLay on the Lmod users mailing list and from the Lmod documentation and presentations of others.