Interview with the Developer of Singularity
Sometimes we see the names of people working on the Linux kernel or other high-profile projects, but we don't hear much about these people working behind the scenes that are contributing their knowledge for the greater good. Literally thousands of very hard working and extremely intelligent people develop open source software – most of the time for Linux.
This is particularly true for HPC. In the TOP500 list of November 2015, Linux accounted for 98.8% of the systems. I would call this a pretty dominating position; yet, many of the developers writing HPC software on Linux are unknown.
Gregory M. Kurtzer has been working in HPC and Linux for many years. He has influenced the computing landscape and developed tools that you might be using today without even knowing it. In this interview, I talk to Greg about his background and some of his projects in general and about his latest initiative, Singularity, in particular. (Also see the article on Singularity.)
Jeff Layton: Hi Greg, tell me bit about yourself and your background.
Gregory M. Kurtzer: Oh, how I love talking about myself but will try to contain my enthusiasm and keep this relevant and concise. My work with Linux began back in the mid-90s, after I obtained my degree in biochemistry and focused early on genomics and bioinformatics. Along this path I became enamored with the idea of open source software and changed my focus from bio to computational science. I job hopped a bit until I landed my current position at Lawrence Berkeley National Laboratory in October 2000.
Since then I have been very lucky to be part of some amazing science, with my computations being specifically in high-performance computing (HPC). I have also taken part in, contributed to, and founded several well-known open source projects (among them, GRAB, Warewulf, Caos Linux, Centos Linux, Perceus, and, most recently, Singularity). Each of the projects that I have worked on was motivated by necessity, and being a strong advocate of the open source development model, I always prefer to release my work back to the community.
JL: What was your involvement with Centos?
GMK: In a nutshell, I founded and ran the Caos Foundation, which is the organization that created Centos. I led Centos for a few years starting from its inception.
JL: Can you elaborate on how Centos came to be?
GMK: Sure, but I will have to give you the PG version both for our younger readers as well as to protect the guilty. (If you want the gory details, well that will cost at least a good dinner!)
Back in 2002-ish, I saw the need for a community-maintained, RPM-based distribution of Linux (Fedora did not exist as it is recognized today), and thus I coordinated with various people and assembled a team of developers focused on filling this need. We were in progress of creating a new distribution of Linux called “Caos Linux” (Community Assembled Operating System) and were picking up quite a bit of momentum when Red Hat decided to change their product lineup and business model. Red Hat decided to EOL (end of life) all of their freely available distributions (then called “Red Hat Linux”) and superseded that with RHEL (Red Hat Enterprise Linux), which can only be obtained through a support contract. This change affected the entire community of Red Hat Linux users, including us, because we still needed an operating system as a build and bootstrap host. One of our team members volunteered to rebuild the source RPMs to RHEL so we can still have access to a secure and up-to-date development platform.
After deliberation with the group, I decided we should release this code by removing the Red Hat trademarks and publish the resulting product under a new name; the first name we used was “Caos EL” but “Centos” was proposed to me by one of the contributors, and I concurred, so it was used publicly. During my leadership, the project went from unknown to a common household name.
I ran the project for a couple of years until politics and private agendas undermined the structure of the group, and I decided to resign my lead and dissociate Centos from the project in which it was born. The final straw was as follows: I maintained the perspective that Red Hat should be praised and accommodated for allowing Centos to exist and even thrive, but several others on the team antagonized Red Hat (and their legal team) to the point of putting me personally in the crosshairs of a legal and ethical confrontation. The project survived only due to the dedication of the engineers, contributors, and supporters.
JL: Commonly, it is referred to as “CentOS” but you call it “Centos.” Is there anything to that?
GMK: I liked the name Centos because it works from two perspectives. The first is because of the root of “cento” which is defined as a literary work made up of the sum of other literary works and is quite appropriate to a Linux distribution. Second is because of the sorta-acronym “Community ENTerprise Operating System.”
While people are free to say and type it however they like, I always preferred Centos rather than CentOS because that shifts the emphasis, changing the root of the name to “cent” which has an obvious connotation to money, and even though it is a small amount of money, Centos is free.
JL: Where do you currently work and what do you do?
GMK: I work for Lawrence Berkeley National Laboratory on joint appointment with UC [University of California] Berkeley. My working title is High Performance Computing (HPC) Architect, as well as being the Technical Leader of my group. It is a fancy way of saying that I help where needed, make sure that the technology we adopt fits in the “big picture,” architect solutions for previously unsolved problems, coordinate group members so that we are all moving in the same technical direction, and, most importantly, know when to let the team do what they do best and bring lunch.
Working for the Government and UC system has various advantages; for example, I am able to take part in relevant open source projects. I currently lead the open source Warewulf cluster management toolkit, which is almost 15 years old now. (Time seriously flies when you are having fun!) Warewulf is commonly used in HPC clusters, and it allows one to provision and manage a very large number of compute resources (nodes). It has been used to provision and manage some of the biggest supercomputers in the world. At present I am working on the architecture for the next major version of Warewulf (v4).
I am also part of a new open source initiative being formed called OpenHPC hosted by the Linux Foundation, where I will be serving on the technical steering committee as a “Component Development Representative.”
Another project I just started recently targets the need for us to support containers within our HPC resources.
JL: Why did you decide to create a new container system?
GMK: As I mentioned, every project and development effort I have undertaken is motivated directly by necessity.
There is a growing interest from our user community in using containers (or more specifically Docker) to satisfy the needs of mobility of compute. But Docker imposes various complexities in an implementation on a shared HPC resource. For one example, users within a Docker image that they have created can easily escalate to root. But if any non-authorized users gain root access on our production network, that is a breach of security. This means that we must narrow any exposure that Docker containers may have access to. While this is fairly straightforward to do, it does limit the scope of access to resources considerably. For example, on an HPC resource, much of the high-performance value exists in the scalable filesystems and high-performance network fabric (which is usually InfiniBand on Linux clusters), neither of which can be securely deployed in an environment in which users have root access.
With this model of segregation of all Docker instances, the very best we can do today is to create virtual sub-clusters, which not only moves us in a direction of increasing complexity (from an operational, architectural, and usability perspective), but it also precludes direct access to the high-performance components of the resource; in effect, we would be removing the HP out of HPC! We gain very little aside from buzzword compliance.
So let's re-evaluate: What problem do we need to solve? Enabling mobility of compu…. Portabili…. High performan…. Supporting existing workflows and existing resource architectures …. These are the problems that we need to solve, and the existing container systems do not provide a directly applicable solution, which is why I created Singularity.
JL: How is Singularity different from other container systems?
GMK: Most people's common understanding of operating system-level virtualization (a.k.a. containers) comes from a direct association and lineage to hardware-level virtualization (VMware, KVM/Qemu, VirtualBox, etc.). This is supported by an obvious inheritance of features as well as target use cases. While containers do provide a more efficient manner for homogeneous operating system virtualization than full hardware virtualization, this relation imposes similarities in a high-level architecture and thus the roles for the virtualized systems. For example, in a hardware-level virtualized environment, operating system instances need network separation, so each container appears to have a different network address. You may also want to limit resources such as memory, CPU cores, or disk space. It also requires a full operating system image, so updates and package installations use standard means (e.g., Yum/Apt). To mimic these features in containers that share the kernel with the host, you must start off with a fully installed, fully functional userspace guest operating system, use cgroups to impose resource limitations, and then separate out all available namespaces.
Singularity, on the other hand, is not designed to emulate hardware-level virtualization at all. For example, Singularity does not support user contexts within a container. The applications within the container are always running as the user that launched them. Thus, whatever access the user has outside the container on the host system is the same access as the user has within the container. There is no supported means for privilege escalation, so the barrier between inside and outside the container can be blurred. Any resource limitations already imposed on the user still take precedence, so no additional controls must be set in place. Also, Singularity containers are not full operating system images; they only have what is essential to support the program or workflow within them.
Simply, the goal of Singularity is to focus on the application and mobility (portability). When this is done correctly, the container becomes transparent and is unseen and unnoticed. Users can interact with the programs within the container and files outside the container just as easily as launching the programs directly, as if there is no container. As you can see, in this usage model some namespace separation is necessary, but too much will disrupt the workflow.
Singularity then becomes as much of a packaging system as it is a container solution. Using a Singularity container should be as easy as just executing it. Metaphorically as an example, an Apple .app package bundle has within it all of the files, scripts, libraries, and run-time dependencies necessary so it can be easily distributed and executed on arbitrary OS X systems. Singularity does similar for Linux and additionally leverages containment to make the SAPP files portable between distributions with library version or ABI mismatches. It even works with C library version differences.
JL: How do you run Singularity containers?
GMK: Singularity contained applications (SAPPs) are built using a specfile. This specfile defines what files will be included in the container and what happens when the SAPP file is executed. While there are multiple methods for running a Singularity container, one method is as easy as executing the container file directly (e.g. ./my-container.sapp).
You can define the container to wrap a single command such that executing the SAPP file itself will behave exactly like the command that you wrapped. Or, you can define the container to process a custom workflow when it is executed, in effect creating a custom application stack. The end user simply runs the SAPP file container as a command-line program. Inside the container, this may trigger a cascade of events with workflows involving multiple programs, static data, pipes, scripts – all using the core libraries, programs, and files from the system it was built on to achieve reproducible results and mobility of compute.
JL: How about running GUI-based applications that use X windows in a Singularity container?
GMK: Absolutely. Keep in mind that it is up to the packager of the Singularity container to make sure that all of the relevant components, files, fonts, and bits are included in the container. Singularity is smart enough to be able to package up something simple like xterm by simply including /usr/bin/xterm in your specfile, and it will work as expected. But for more complicated X applications, you may need to include more. Singularity dependency solvers are always under development, and as they mature, packages will be easier to manage.
At present, several dependency solvers have been developed, but Singularity already knows how to deal with linked libraries, script interpreters, Perl, Python, R, and OpenMPI. An example of this can be seen if you want to create a container to execute a Python script. The specfile could literally be one line long (e.g., /<path/to/script>.py), and Singularity will automatically see that it is a Python script, pull in Python, search for the necessary Python text and binary dependencies, and then package everything up into an executable container.
Adding additional dependency solvers is pretty easy as long as we know what to include. I encourage people to let us know (with the GitHub issue tracker) how we can improve on the dependency resolution of any given software type.
JL: With your HPC background, how does Singularity operate with parallel applications (MPI)?
GMK: For those outside of the HPC world, MPI (Message Passing Interface) is a specification which defines an API and usage model for distributing parallel applications. While there are several methods to achieve this, MPI is the most common and highly standardized method to parallelize applications at scale. Additionally, it is one of the most complicated use cases for containers.
This is because the MPI relies on a highly tuned environment that may be specific to a hardware architecture and kernel build. So, distributing a portable binary that can interact properly on these hosts becomes challenging. Additionally, as you can imagine, coordinating containers to launch on multiple host computers also proves to be challenging. Also, take into consideration that these jobs are typically launched via a batch scheduler, which creates even more complexities.
Long story short (perhaps too late), the architecture of Singularity lends itself to being an appropriate solution for this as well. Singularity naturally runs the applications within the container as if they are not contained, which greatly simplifies the integration with MPI. That, coupled with the traditional MPI process invocation pathway, allows the MPI process daemon to run Singularity, and for Singularity to in turn launch the container, and then exec’s itself to run as the MPI application within the container. The application runs and loads the MPI-specific dynamic libraries, which in turn connect back to the MPI process daemon (outside the container) via the Process Management Interface (PMI), and then everything runs as expected. Again, because of Singularity's run-time architecture, solving this problem and adding the appropriate hooks both in Singularity and MPI is orders of magnitude easier than other container solutions and allows for seamless integration into the batch scheduling subsystems.
Singularity solves a lot of the issues facing the infrastructure and service providers: everything from scheduling and supporting containers, to access of the existing physical and software environment (e.g., parallel filesystems, InfiniBand, etc.). User support stays where it should be – on the host system – and any debugging of Singularity containers is up to the respective packagers. Singularity also makes a giant leap to a new notion that lots of HPC centers are trying to claim; “push-button computing”: the idea that accessing an HPC resource is as simple as uploading a job and/or data to a portal and being notified when the job is done. With Singularity, for the first time, we can see how this is truly possible.
JL: What is the current state and future of Singularity?
GMK: At the time of this writing, Singularity is quickly approaching the first release. I am behind on the 1.0 milestone because initially I did not plan on releasing with MPI support. But conditions were in alignment to include MPI support in the release and that was worth the delay. Check the web pages and join the Google group to get updates and to stay apprised of the progress and release. You can also find documentation on how to install and use both the pre- and post-releases on the documentation pages.
Once released I have three main goals: (1) to obviously continue with development, making Singularity more reliable, faster, and easier to package more types of applications; (2) to create a repository of containers to which people can upload their work for others to download pre-made SAPPs; (3) to get Singularity included in the various distributions. To achieve all three goals, I need help and am hoping that more people will join the project and be part of the progress!
JL: If someone wants to get a hold of you or follow your work, what is the best way?
GMK: If you do want to follow my work, I can be found on GitHub: http://gmkurtzer.github.io.