The achievements of and plans for systemd
Extending Integration
Linux Magazine: If you take stock of the last three or four years, what have been the most important innovations in systemd during this time?
Lennart Poettering: That would be, firstly, all the security features we have added and made visible with the systemd-analyze security
tool. Regular system services can now be locked into effective sandboxes with relative ease, but can still be integral parts of the host operating system. I believe this has advanced Linux system security quite a bit.
Another important innovation might be systemd-tmpfiles
and systemd---sysusers
. Strictly speaking, they are more than four or five years old, but it is only in the last three or four years that they have finally seen more widespread use in the popular distributions. We are looking to move to a declarative description of the system and its components, leaving behind imperative scriptlets in packages and the like. This improves robustness, security, and reproducibility.
The dynamic user strategy makes it possible to allocate system users dynamically when starting system services that are automatically released again when the service terminates. This takes into account that system users are the original mechanism used to implement privilege separation on Unix and Linux. No matter which subsystem you look at, access control based on users is always implemented on Linux. Other concepts – such as SELinux labels, Access Control Lists (ACLs), other Mandatory Access Controls (MACs), and so on – are not universally available and are nowhere near as popular or as universally well understood.
Classically, however, such system users are expensive, with only 1,000 of them (or sometimes only 100 or 500, depending on the distribution), and they are allocated individually during package installation. So traditionally they can only be used roughly to secure large services but not to protect individual instances or transactions. There are simply too few of them for that. The dynamic user concept solves the dilemma: It makes cheap what was previously expensive. Dynamic users can be allocated for a short time and returned after use. This practically breathes new life into an old Unix strategy and is a mechanism that can definitely contribute a great deal to further improving the security of Linux systems.
Last but not least is systemd-homed
, a really secure home directory management system where the user password is the encryption key. I could continue this list for a long, long time – after all, there are so many useful new features in systemd. If you want to know more, take a look at the NEWS
file in the systemd sources, which is where we write everything down in more detail, while hopefully keeping things reasonably understandable.
Maybe a word about one last set of innovations: We recently added support for FIDO2, PKCS#11, and TPM2 security chips to systemd for disk encryption or user authentication. For the first time, this makes it possible to set up truly secure systems on Linux with practically on-board tools, without getting lost in massive manual scripting sessions or reducing security to passwords.
LM: What else is on the wish list for the near future?
LP: Many people working on systemd have different interests. Personally, I have a great interest in simply making Linux even more secure, and, by that, I mean the classic, generic Linux distribution.
It hurts a bit that other operating systems like macOS or Windows currently protect user credentials better than we commonly do on Linux with our home directories. Even the non-traditional Linux systems like Chrome OS or Android are generally far better secured than classic Linux distributions because they detect and prevent offline modifications of the system, for example. Applications also run on them in relatively secure sandboxes by default. None of this really exists on classic Linux so far. There are projects in this vein, but only a few of them have reached the "mainstream" of Linux distributions so far.
This is exactly where I hope to improve the situation. The basic infrastructure is certainly provided by most distributions, but there is a lack of integration, of connecting the various subsystems to make them useful, which is exactly what the support for TPM2/PKCS#11/FIDO2, mentioned earlier, is aimed at. The subsystems for the respective technologies have existed for a long time, but few specialists actually use them together because the required integration with the rest of the operating system just never happened or was incomplete. I see systemd as the project that can do precisely that in a good way – determining where the journey should go and then integrating the subsystems needed to get there. For example, tying disk encryption to TPM2/PKCS#11/FIDO2 fits right into this scheme, but there is far more to be done in this area.
Thus, while many – possibly even most – users use disk encryption on Linux, typically typing the disk password does not protect the program code very well, which leaves you helplessly exposed to an evil maid attack, an offline attack in which someone simply exchanges the boot code of the system being attacked. You have no way to tell whether the cryptsetup
binary to which you give your password is really the one you trust or perhaps a hijacked one that immediately sends the password to an attacker.
Other operating systems are doing much better, including Linux-based ones. I'd like to see us catch up there with generic Linux distributions so that the data on our laptops remains at least as secure in every way as, say, on a Chrome OS system. It's downright embarrassing that this is not yet the case. We need to do better, especially in this age of Pegasus and similar systematic security threats. I think systemd can and should play a certain role in making generic distributions more secure: more TPM2, meaningful secure boot, more sandboxing, more encryption, more integrity – and all without really demanding more knowledge from the admin.
Another related topic in this context is Rust: Sooner or later we should move away from C. It's just too hard to use the language correctly, and even the best developers make mistakes all the time. Rust is probably the first language that has a chance to replace C on a broad front. For systemd, that means we have to figure out how to make the transition as developers. We don't want to be pioneers but instead wait for other projects to solve the most pressing problems for us before we make the leap ourselves. After all, for us, a programming language is just a tool, not a purpose in itself.
LM: One of the systemd goals was to accelerate and standardize boot sequences. This goal can be considered achieved today with most distributions relying on systemd, but did this not happen at the cost of a far larger number of systems being affected in the case of security-relevant errors than the case would be with more diversity?
LP: Acceleration was never the primary goal of systemd development but simply a side effect of the work to implement the boot process in a reasonably state-of-the-art way. We have emphasized this time and time again. We always try to find the balance between having a manageable, modular system while booting quickly and in a reasonably straightforward way. If in doubt, however, we have always opted for correctness and manageability.
By way of an example, we work a lot with small files in drop-in directories, such as unit files located in /usr/lib/system/system/*
. These support modularity, so package managers can easily and elegantly add and remove components from the operating system. In terms of boot speed, this is more of a disadvantage: If we packed the service descriptions into a single large file instead of many small ones, they could almost certainly be read many times faster, but then nothing would be modular. However, modularity is more important to us than plain speed at boot time, so we went for drop-in files anyway.
I don't think it's a good idea to balance supposed speed advantages against security gains from more init system alternatives – they have nothing to do with each other. Sure, it would be good if there were convincing Linux init system alternatives to systemd – competition stimulates business, monoculture is not ideal – but I still believe that the very best thing for more computer security is better technical security strategies: more sandboxing, lockdown, integrity checks, and so on. You certainly don't do general computer security any favors by continuing to maintain multiple init systems that offer no security strategy at all. However, if we give systemd security features that are then widely used, that's worth far more at the end of the day.
To put it another way, I find it far more interesting to make one class of attacks completely impossible than to hope that "only" one half of the Linux world is vulnerable to it because the other half uses a slightly different system. On top of this, sure, there is definitely some interesting competition driving the security of computer systems, such as Chrome OS, Android, Windows, macOS, and so on. For us, this is highly relevant inspiration.
I would like to set one thing straight: Thus far, systemd actually looks very good when it comes to code quality and vulnerabilities. We have significantly fewer CVEs or the like (admittedly not a good metric) than other projects with similar numbers of lines of code. It should also be remembered that projects such as wpa_supplicant
weigh in with more lines of code than systemd (even the kernel has many times that), so with all the components that come with systemd, the init system is not exactly the primary component to worry about. The attack surface of the WiFi stack or the kernel turns out to be far larger, so a monoculture there certainly causes bigger problems.
LM: Originally, systemd was intended as a replacement for the SysVinit system. In the meantime, however, it manages all kinds of resources, including its own out-of-memory (OOM) killer. In 2018, Facebook already came out with OOMd developed in-house as a competitor to the implementation in the kernel. What makes the systemd version better than the two predecessors?
LP: The systemd-oomd
service integrated into systemd was programmed by Facebook developers. It is a simplified evolution of the old separate OOMd.
Systemd manages system services – that is its very specific task. Two facets of this management are lifecycle management and resource control (i.e., correct and clean startup and shutdown of services at the right times and the allocation of resources and their limits). An OOM service directly intervenes in exactly these two parts. Depending on individually configurable parameters, it shuts down services as needed. This works best when the OOM service and systemd agree on what to do. That's why we integrate strategies: systemd-oomd
can analyze the system and become active; the systemd service manager knows about it and informs the administrator correctly.
Additionally, the following applies here: We always add components to systemd when we assume that the service will ultimately benefit a significant majority of users. This should also be the case with systemd-oomd
. To use available resources in the best possible way under load, you need a service like OOMd. Unlike, say, the OOM killer in the kernel, it keeps an eye on the whole system. It tries everything to handle resource bottlenecks and the resulting latencies as locally as possible and not to affect the whole system. This is needed to utilize thick servers as fully as possible but also to achieve maximum performance in embedded systems with few resources – and helps on the desktop, as well. For the first time, you can no longer freeze your laptop with make -j
on the wrong build tree.
LM: How does systemd fit into a world where applications are increasingly no longer launched directly from the operating system but in the form of containers?
LP: Here, too, you need an underlying operating system. The container strategy is (among other aspects) primarily about isolation from the host OS. However, extensive isolation from the host operating system is neither helpful nor possible for many applications. A service that makes extensive use of hardware can only be run in a container if you rely on hacks and workarounds.
Containers are without question very useful but are more for payloads than for system components. For the latter, you need infrastructure like that provided by systemd. The sandboxing offered by systemd for system services is ultimately inspired by container strategies, but it takes into account that complete isolation (e.g., a complete directory tree of its own) is more of a hindrance for system services. Therefore, it allows for far more modular sandboxing that tries to support integration, while still minimizing the attack surface for hacks as much as possible.
LM: What distinguishes containers launched by Nspawnd from portable services?
LP: The systemd-nspawn
tool is versatile and so are portable services. Where one makes more sense than the other is not always clearly defined. Basically, though, I would say systemd-nspawn
is about working in a similar way to lightweight VMs (virtual machines). For example, with Nspawn, a more or less complete Linux can be booted without any overhead, almost like in a VM. Portable services is more about making individual system services a bit more portable (i.e., making it easier to move relatively integrated system services between machines).
You could also say that the first program that runs as a payload in a VM is the operating system kernel. In an Nspawn container this is an OS init system instead, whereas in a portable service it is the main program of a service. The latter may resemble a Docker container, but Docker containers tend to run isolated from the host OS, which is not so much the case for portable services.
LM: Kubernetes and its offshoots such as OpenShift have become widely accepted for container management. What niche can
systemd-nspawn
best serve?
LP: systemd-nspawn
can run containers, whereas Kubernetes orchestrates containers in clusters – two very different tasks. Kubernetes normally uses a tool like runc
to run the containers. If you want, you could use systemd-nspawn
instead of runc
to do this; the infrastructure would lack very little. For example, systemd-nspawn
already has direct support for running OCI containers onboard.
I personally have certain doubts about the Kubernetes approach. It seems to me that a lot of things have not been thought through to the end but glommed together with hot glue. That's why I haven't done anything yet to make systemd-nspawn
usable as a back end for Kubernetes. I think such an approach would have advantages in terms of security and especially resource control.
Basically, however, Docker-style containers usually only run individual services in them, not the entire operating system. As mentioned before, the focus of systemd-nspawn
is more on the latter. We want to make it easy to run full Linux userspaces in them, much like in a VM or on a physical system. So, the focus of systemd-nspawn
is a bit different from runc
and Kubernetes.
LM: You propose migratable home directories that bring the user account information right along with them. Does that only work if the user mounts their home directory on their own host? Who else would create such a directory on a portable medium? In the conventional system, the write protection for
/etc/passwd
ensures that a user cannot add their account to arbitrary groups, for example. If this information is located directly in the home directory, the user should not be allowed to edit it there. Who can do this if the directory is to be mounted on arbitrary hosts?
LP: Typically, home directories are still located on your own laptop's hard drive, but if you let systemd-homed
manage them, you can also put them on, say, a USB stick and move them safely back and forth between different systems. I'm sure some users will find this helpful, but it's more of a side effect of the design and not the goal. I myself use systemd-homed
to manage my home directory, but I just store it on my laptop's SSD.
The user records that systemd-homed
manages are cryptographically signed, and the daemon only accepts records that match the local machine. This signing and verification takes place completely automatically, without the user having to do it manually. This means two things: First, users cannot easily modify their own user records unless they know the system's secret key, which is protected under /var
and should therefore only be known to the system and root. Second, when moving a home directory from one system to another, you have to make sure that the signature key of the first machine is also accepted on the second machine, which can be done by a simple scp
.
LM: Mr. Poettering, thank you very much for the insightful interview.
Infos
- Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0): https://creativecommons.org/licenses/by-sa/3.0/
Buy this article as PDF
(incl. VAT)