Keeping It Straight: Environment Modules
Consider the following scenario: You are either an administrator or a user who wants to build the PETSc package from Argonne Labs. The first step is to download the package and inspect the configure options, and you notice that some simple options are a choice of MPI and BLAS libraries. Of course, you also need to choose a compiler. The task seems simple enough until you lay out the possible choices:
- Compiler: GNU, Open64, Intel, Portland Group, EKOPath, …
- BLAS Library: Atlas, OpenBLAS, Intel MKL, AMD AML
- MPI: Open MPI, MPICH2, MVAPICH
The array of choices means you have quite a few build options. Clearly, you don’t need all of them, but it might be nice to have a small subset of choices to see which works best for you. To begin, you build an open source tool chain, then you move on to commercial options later. Thus, your current choice tree is
- Compiler: GNU
- BLAS: Atlas, OpenBlas
- MPI: Open MPI and MPICH2
giving you four possible combinations, so you decide to build them all. To keep things straight, you devise a flexible installation path that consists of package-name/version/compiler/mpi-library/blas-library. Although you start with four possibilities, new branches can be added by introducing new (or old) versions of PETSc, other BLAS and MPI libraries, or compilers:
/opt/petsc/3.3/gnu4/mpich2/atlas/ /opt/petsc/3.3/gnu4/mpich2/openblas/ /opt/petsc/3.3/gnu4/openmpi/atlas/ /opt/petsc/3.3/gnu4/openmpi/openblas/
When you start to build your code, keeping all the environment paths straight becomes a bit of challenge, so you create scripts that load the correct execution and library paths for each combination of build options. After a few days of building and plenty of coffee, you are finally ready to release your PETSc versions to the cluster. Once built, the dynamic version of your code needs to know where to find the libraries. Simple enough, you think: I’ll just add the library path to LD_LIBRARY_PATH in my .bashrc file. Additionally, PETSc has a dependence on the PETSC_DIR environment variable, so you also place this in your .bashrc file.
If this exercise sounds familiar, you know that in almost all cases, the whole process starts to get too messy and mistakes are made. If you are doing this for a group of users, they have to change their .bashrc or .cshrc files, and you can only hope they get it correct. If you get a new compiler, that might introduce another whole branch in the build tree, and more confusion could result.
The Modular Solution
Of course the environment path/libraries problem is not unique to HPC, but it is important in HPC because there is never a shortage of options when building applications. Plus, many versions of the same standard libraries are used in HPC (e.g., MPICH2, Open MPI, MVAPICH).
Unbeknownst to many users (HPC or otherwise) the software environment issue was addressed in the early 1990s when John L. Furlani introduced his Modules package, and life just became easier for many users. Modules can be loaded and unloaded dynamically and atomically in an clean fashion. All popular shells are supported, including Bash, Ksh, Zsh, Sh, Csh, and Tcsh, as well as some scripting languages such as Perl. As an example, consider a system with two MPI versions (MPICH2 and OpenMPI). The following illustrates how easy it is to manage the two environments with the use of two module commands, load and rm:
$ module load mpich2/1.4.1p1/gnu4 $ which mpiexec /opt/mpi/mpich2-gnu4/bin/mpiexec $ module rm mpich2/1.4.1p1/gnu4 $ module load openmpi/1.6.2/gnu4 $ which mpiexec /opt/mpi/openmpi-gnu4/bin/mpiexec
Notice how the path to mpiexec has changed with a change in modules. All other important environment settings for each MPI have changed as well. The details of the above process will be explained below, but some other simple module commands are worth noting, too, such as avail and list. From the previous example with the OpenMPI module active, you can now load another module (Grid Engine) and list the currently loaded modules:
$ module load sge6 $ module list Currently Loaded Modulefiles: 1) openmpi/1.6.2/gnu4 2) sge6
The module naming scheme is based on a module file that describes how to change (and restore) the environment. For instance the module file openmpi/1.6.2/gnu4 is for OpenMPI version 1.6.2 built with gnu4 compilers. The actual module information is in the file named gnu4. To determine what modules are available, the user can list all the possible files using the avail option:
$ module avail -------------------------- /opt/Modules/modulefiles ----------------------------- atlas/3.10.0/gnu4/i5-2400S modules blacs/1.1/gnu4/mpich2 mpich2/1.4.1p1/gnu4 blacs/1.1/gnu4/mpich2-omx mpich2-omx/1.4.1p1/gnu4 blacs/1.1/gnu4/openmpi null blas/3.4.2/gnu4 openblas/0.2.3/gnu4/i5-2400S dot openmpi/1.6.2/gnu4 fftw/2.1.5/gnu4 padb/3.3 fftw/3.1.2/gnu4 petsc/3.3/gnu4/mpich2/atlas fftw/3.3.2/gnu4 petsc/3.3/gnu4/mpich2/openblas fftw-mpi/2.1.5/gnu4/mpich2 petsc/3.3/gnu4/openmpi/atlas fftw-mpi/2.1.5/gnu4/openmpi petsc/3.3/gnu4/openmpi/openblas fftw-mpi/3.3.2/gnu4/mpich2 scalapack/1.7.5/gnu4/mpich2/atlas fftw-mpi/3.3.2/gnu4/mpich2-omx scalapack/1.7.5/gnu4/mpich2/openblas fftw-mpi/3.3.2/gnu4/openmpi scalapack/1.7.5/gnu4/openmpi/atlas gsl/1.15/gnu4 scalapack/1.7.5/gnu4/openmpi/openblas lapack/3.4.2/gnu4 sge6 module-cvs use.own module-info
As can be seen, for this system, you have plenty of module choices, which represent various build options for various packages. In particular, the PETSc modules look very much like the PETSc installation paths listed above. (Note: There is no requirement that the module files and paths reflect the actual installation path.)
A Simple Module File
The following listing for the sge6 module file should look familiar if you know the Tcl language because module files are basically written in Tcl. Understanding the file is fairly simple, and you should recognize some of the standard environment variables.
#%Module1.0############################################################ ## sge modulefile ## proc ModulesHelp { } { puts stderr "\tThis module loads the appropriate environment for " puts stderr "\tusing gridengine Version 2011.11p1 \n" } module-whatis sge6 module-log error none set sgeroot /opt/gridengine set arch [ exec $sgeroot/util/arch] if [ module-info mode load ] { setenv SGE_ROOT $sgeroot append-path MANPATH $sgeroot/man prepend-path PATH $sgeroot/bin/$arch append-path LD_LIBRARY_PATH $sgeroot/lib/$arch } if [ module-info mode remove ] { unsetenv SGE_ROOT remove-path MANPATH $sgeroot/man remove-path PATH $sgeroot/bin/$arch remove-path LD_LIBRARY_PATH $sgeroot/lib/$arch }
The module file can be broken into several sections. The first line, #%Module1.0, is the identifying line and must appear first in the file. Next, the ModulesHelp section has the help message displayed when the module help sge6 command is entered. Continuing from there are some module settings, including the package root path (sgeroot) and the architecture. (Note: arch is the result of executing an sge utility from within the module file.) Finally, the load and remove sections are defined and is where the environment is changed.
The load section does the following:
- Sets SGE_ROOT to the gridengine path
- Appends the gridengine/man path to MANPATH
- Adds the gridengine execution directory to the front of the PATH variable
- Adds the gridengine libraries to LD_LIBRARY_PATH.
In a similar fashion, the remove section reverses all the changes done in the load section, returning things to the way the were.
Other options are available as well that let you set prerequisites or conflicts so loading a module will not cause problems with other modules. Moreover, because the module file is written in Tcl, you can program whatever configuration you need for the module. Once Modules is installed, you can find many more options and features and more information in the module(1) and modulefile(4) man pages.
Installing Modules
The Modules package is on SourceForge. Building the current version, 3.2.9s, is straightforward, although you need to make sure you have Tcl installed, and you need to set the configure options accordingly (for simplicity, build it without X):
# ./configure --with-tcl-lib=/usr/lib64/ --with-tcl-inc=/usr/include/ --without-x # make # make install
The default install is /usr/local, but the location of the module package, module files, and man pages can be changed with the appropriate configure options (run ./configure --help for more information).
If the above build was successful, you can run
/usr/local/Modules/3.2.9/bin/modulecmd sh
to check the install. If you get usage instructions, the build and install worked. If you get an error like
init.c(425):ERROR:159: Cannot initialize TCL
then your Tcl installation is probably corrupted, which usually means the internally compiled path to the init.tcl script is incorrect: Check your Tcl install.
Certain environment variables and aliases (functions) need to be set for Modules to work correctly. This task is handled by the Module init files in /usr/local/Modules/default/init, which contains separate init files for each of the various supported shells; default is a symbolic link to a module command version. (Note: the install does not seem to create the default path link, but it is easily fixed by moving to /usr/local/Modules and, as root, entering
ln -s 3.2.9/ default
The easiest way to initialize Modules is to use the standard /etc/profile.d mechanism. This can be done by copying the following initialization files to /etc/profile.d. For instance, copy the following from Modules source directory:
# cp etc/global/profile.modules /etc/profile.d/modules.sh # cp etc/global/csh.modules /etc/profile.d/modules.csh
When the users log in, Modules automatically will be initialized for their current shell. The next step is to create module files. Check the INSTALL file in the Module source directory along with the modulefile(4) man page for information on how to create the file. The above example can also be used as a reference.
If you do not have admin privileges on your system, you can easily install Modules in your own directory and include the configuration in your .bashrc or .cshrc file. Of course, once your admin sees the advantages to Modules, he or she will want to install it globally.
Conclusions
Finally, you can find a similar and mostly compatible environment package called lmod, sponsored by the Texas Advanced Computing Center and written in the Lua programming language, that supports both existing Tcl and Lua-based module files.
Modules is one of those packages that “just makes life easier.” Once you start using Modules, you will wonder how you ever managed with it. Those miscompiled, won’t link, won’t run projects will be a thing of the past – at least when when you can trace the problems to a misconfigured environment!