Initial Best Practices for Containers
Containers provide reproducible, portable applications. In a sense, they are an evolution of virtual machines (VMs), but without the entire operating system (OS). Therefore, containers are a bit smaller, easier to move around, possibly easier to create, and typically run on bare metal, dispensing with the hypervisor abstraction.
Many people in the high-performance computing (HPC) world are adopting containers to help users and application developers package their software. The containers can then be shared with others, who then don’t have to rebuild an application and test it on their system – they just use the prebuilt application in the container.
Containers can also be used as a way to reproduce work. A container can package everything, starting with the application, datasets, output, and any written reports or papers. Including datasets can be problematic because they can inflate the size of the container, but if you are archiving the container, size isn’t as big a consideration.
The dominant container, Docker, was created for developers to share their work easily. Because developers usually have root access on their development system, not much thought was given to security, so Docker needs to be run as root.
However, in the HPC world, where many people share a single system, this lack of security is a non-starter. The first, and arguably most popular, container for HPC is Singularity, which provides the security of running containers as a user rather than as root. It also works well with parallel filesystems, InfiniBand, and Message Passing Interface (MPI) libraries, something that Docker has struggled to do.
Despite Docker’s need to run with root access, it is still probably the most popular container overall. At the very least, the Docker container format is used by a number of other containers or can be converted to another format, such as Singularity.
This article is the first in a series about best practices in building and using containers. The focus of these articles will be on HPC with Docker and Singularity container technologies, although other containers may be mentioned. In this first article, I discuss some of the initial best practices that apply to both Docker and Singularity. Subsequent articles will cover other best practices.
Building Containers
Before building a container, you usually start by creating a specification file (specfile) that is then used to build your container. In general, you define the base OS image and then add packages, code, libraries, and other things you want to include in your container.
The specfile is just a text file that the container reads to create the container. The formats for Docker and Singularity have differences, but they are fairly close. The sample Docker specfile in Listing 1 was taken from the Nvidia Developer Blog. A sample Singularity specfile is shown in Listing 2.
Listing 1: Docker Specfile
FROM nvidia/cuda:9.0-devel AS devel # OpenMPI version 3.0.0 RUN apt-get update -y && \ apt-get install -y --no-install-recommends \ file \ hwloc \ openssh-client \ wget && \ rm -rf /var/lib/apt/lists/* RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.bz2 && \ tar -x -f /tmp/openmpi-3.0.0.tar.bz2 -C /tmp -j && \ cd /tmp/openmpi-3.0.0 && ./configure --prefix=/usr/local/openmpi --disable-getpwuid --enable-orterun-prefix-by-default --with-cuda --with-verbs && \ make -j4 && \ make -j4 install && \ rm -rf /tmp/openmpi-3.0.0.tar.bz2 /tmp/openmpi-3.0.0 ENV PATH=/usr/local/openmpi/bin:$PATH \ LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH
Listing 2: Singularity Specfile
BootStrap: docker From: nvidia/cuda:9.0-devel # OpenMPI version 3.0.0 %post apt-get update -y apt-get install -y --no-install-recommends \ file \ hwloc \ openssh-client \ wget rm -rf /var/lib/apt/lists/* %post mkdir -p /tmp && wget -q --no-check-certificate -P /tmp https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.bz2 tar -x -f /tmp/openmpi-3.0.0.tar.bz2 -C /tmp -j cd /tmp/openmpi-3.0.0 && ./configure --prefix=/usr/local/openmpi --disable-getpwuid --enable-orterun-prefix-by-default --with-cuda --with-verbs make -j4 make -j4 install rm -rf /tmp/openmpi-3.0.0.tar.bz2 /tmp/openmpi-3.0.0 %environment export PATH=/usr/local/openmpi/bin:$PATH export LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH
The length of the specfile can be fairly simple or it can be long and complex. Maintaining these specfiles or editing them can be difficult.
HPC Container Maker
To create or build a container, you can write a specfile that contains the steps needed to build a container. You can find a number of tutorials online for writing these specfiles for both Docker and Singularity. As you get more experience writing them, you discover that they get longer and longer, and sometimes you forget what the specfile does or why you included certain steps. Moreover, you might not include best practices for certain tools or libraries you use in the container, giving up performance.
One best practice I recommend for building specification files is to use HPC Container Maker (HPCCM), which I’ve written about in the past. Written in Python, HPCCM allows you to write a very simple Python script that describes your container; then, it creates a Docker or Singularity specfile. Also, you can create a single HPCCM recipe that has branching to allow you to create multiple container specifications.
HPCCM has some notable features:
- It collects and codifies best practices.
- It makes recipe file creation easy, repeatable, and modular.
- It becomes a reference and a vehicle to drive collaboration.
- It is container-implementation-neutral.
Rather than create yet another specific language, HPCCM relies on Python for the “recipe” of the container you want to build, regardless of the target container type. The HPCCM recipe has the steps you want to take in your container build. You can take advantage of the Python language within the recipe by creating variables and using if /elif /else statements, loops, functions, or almost anything else from Python.
The example in Listing 3 from my previous article on HPCCM shows a sample HPCCM recipe written in Python. The recipe starts with a base image (Ubuntu 16.04) then installs some packages that weren’t included in that image (i.e., make , wget , bzip2 , and tar ) with the built-in apt_get recipe, which has the ospackages option that adds these packages to the container. The recipe then does the same thing for the GNU compilers. Notice that because the compiler versions aren’t specified, HPCCM defaults to using the latest versions for that OS.
Listing 3: HPCCM Recipe
"""This example demonstrates recipe basics. Usage: $ hpccm.py --recipe test2.py --format docker # hpccm.py --recipe test2.py --format singularity """ # Choose a base image Stage0.baseimage('ubuntu:16.04') ospackages = ['make', 'wget', 'bzip2', 'tar'] Stage0 += apt_get(ospackages=ospackages) # Install GNU compilers (upstream) Stage0 += apt_get(ospackages=['gcc', 'g++', 'gfortran']) Stage0 += openmpi(cuda=False, infiniband=False, prefix='/usr/local/openmpi', version='3.1.0')
The last line in the recipe adds the OpenMP MPI library to the specfile. In this case, it’s telling HPCCM not to include the commands for building with GPUs or InfiniBand and to build the library and install it into /usr/local/openmpi . Finally, it tells HPCCM to use OpenMPI version 3.1. When you use HPCCM to create a specfile, all of the specfile options to download the source code, including the dependencies, are added to the specfile. All of the build commands are also included.
Note that the recipe uses “stages,” starting with Stage0 , that allow you to create multistage or multilayer builds. You can keep adding items to the stage (e.g., specific packages) or even have it build packages for a particular stage. If you want to create a multistage specfile, you would start with Stage0 , then move to Stage1 , Stage2 , and so on.
Before applying the HPCCM recipe, save it to a file (e.g., test2.py ); then, HPCCM can take that recipe and create a Docker or Singularity specfile. The command for creating a Singularity specfile (Listing 4) is:
$ hpccm --recipe ./test2.py --format singularity > Singularity2
Listing 4: The Singularity2 Specfile
BootStrap: docker From: ubuntu:16.04 %post apt-get update -y apt-get install -y --no-install-recommends \ make \ wget \ bzip2 \ tar rm -rf /var/lib/apt/lists/* %post apt-get update -y apt-get install -y --no-install-recommends \ gcc \ g++ \ gfortran rm -rf /var/lib/apt/lists/* # OpenMPI version 3.1.0 %post apt-get update -y apt-get install -y --no-install-recommends \ file \ hwloc \ openssh-client \ wget rm -rf /var/lib/apt/lists/* %post mkdir -p /tmp && wget -q --no-check-certificate -P /tmp https://www.open-mpi.org/software/ompi/v3.1/ downloads/openmpi-3.1.0.tar.bz2 tar -x -f /tmp/openmpi-3.1.0.tar.bz2 -C /tmp -j cd /tmp/openmpi-3.1.0 && ./configure --prefix=/usr/local/openmpi --disable-getpwuid --enable-orterun-prefix-by-default --without-cuda --without-verbs make -j4 make -j4 install rm -rf /tmp/openmpi-3.1.0.tar.bz2 /tmp/openmpi-3.1.0 %environment export LD_LIBRARY_PATH= /usr/local/openmpi/lib:$LD_LIBRARY_PATH export PATH=/usr/local/openmpi/bin:$PATH %post export Misc LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH export PATH=/usr/local/openmpi/bin:$PATH
If you want to create a Singularity container, you can run the command:
% singularity build test2.simg Singularity2
HPCCM is a great help in writing specfiles for Singularity and Docker, making it much easier to add and build packages for your containers. A single HPCCM recipe file can work for both Docker and Singularity and a variety of distributions. These recipe files are simple and much easier to understand than other types of specfiles, and I’ve found them to be indispensable for building containers.