Isolate workloads from Docker and Kubernetes with Kata Containers
Sealed Off
If DevOps adherents want to package and run applications separately, as the paradigm of continuous delivery dictates, they encounter a problem. Either they host their workloads on physical or virtualized servers in the old-fashioned way, or they entrust data and programs to the kind of containers that have recently become popular under the Docker label. Both approaches have advantages and disadvantages.
The advantage of physical servers is the high degree of isolation. In the case of a bare metal device, this is immediately understandable; for a virtual machine (VM), the hypervisor handles separation with the help of its processor. It's a peculiar twist of fate that the Meltdown and Spectre vulnerabilities have shown that even hardware is not infallible, but disregarding this unpleasant chapter on the limits of engineering skill, both servers and VMs effectively separate resources by sharing only the CPU and no other resources.
In contrast, all containers running on one host use the same kernel. When the individual workloads are small and plentiful, the savings potential is considerable, particularly with modern cloud-native applications that run as microservices.
Isolation of Resources
Data isolation is handled by a relatively new mechanism in the Linux kernel. Namespaces keep a record within the kernel of which process is allowed to access which resources (e.g., processes, network interfaces, or hard disk directories). Docker instrumentalizes these namespaces and controls them and a few other mechanisms with a convenient command-line tool.
DevOps thus effectively has the choice between heavyweight VMs that take a long time to boot and lightweight containers with limited isolation that rely on cleverly implemented kernel-based isolation.
Starting from this situation, several initiatives have tried to combine the two approaches. As a by-product of its Hyper-V virtualizer, Microsoft developed the runV project, which can be understood as a very lightweight hypervisor [1]. Intel has taken a similar direction with its Clear Containers project [2], which uses the VT-x technology built into most modern Intel CPUs.
Under the umbrella of the OpenStack Foundation, which for some time now has not dealt exclusively with widespread cloud software or other infrastructure projects, developers have brought together the ingredients of Microsoft and Intel and created the Kata Containers project under the Apache license [3].
In May 2018, the project team released version 1.0; now, version 1.5-rc2 is available for download and promotes the software with the slogan "The speed of containers, the security of VMs."
Compatible Run-Time Environment
Kata Containers avoids a new application model and jumps on the Docker bandwagon. Under pressure from competitors and the community, Docker Inc., the company behind the software of the same name, removed the run time from Docker some time ago and installed the Open Container Interface (OCI), which is supervised by the Linux Foundation. The OCI reference implementation runC is available in Docker.
In addition to runC are a number of alternatives, such as CRI-O (originally known as OCID) developed by Red Hat or Rkt (say "rocket") originally driven by CoreOS. Not all run times fully meet the OCI specification, but they use conceptually similar techniques. The real trick now with Kata Containers is that a hypervisor-based run time is available that differs only under the hood from the classic runC in Docker. The Docker commands, their meaning, and even the image formats and command-line parameters remain the same.
For this to work, a few prerequisites must be met: Kata Containers only works on the x86 platform and requires the presence of the VT-x function. The guest system must allow Kata Containers to launch its own hypervisor, which may well be an issue on virtualized hosts, for example, if they are running as VMs in a public cloud and nested virtualization there is not enabled.
Nested Virtualization
If you have access to the host system for your own virtualization setup, you can run
cat /sys/module/kvm_*/parameters/nested
to check whether this function is enabled. If so, the command displays Y
, and N
otherwise. The shell wildcard is necessary because either one of the kernel modules – kvm_intel
or kvm_amd
– must be loaded to provide the function. The hypervisor must additionally have the mode=host-model
set in the <cpu>
tag of the XML configuration.
If you do not have access to the host system and nested virtualization is disabled, you might be able to use a bare metal service that some clouds offer. From a security point of view, this may be a good idea anyway, because cloud customers can then be sure that no neighbors in the cloud will have access to their data, even in the event of incidents such as Meltdown or Spectre. Nevertheless, this approach is only suitable for really sensitive data, because the shared economy advantages of the cloud are of course forfeit with this kind of server.
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.