« Previous 1 2 3 Next »
Keeping Docker containers safe
Weak Link
Contain Your Surprise
Host security, however, is only half the battle. Now, consider the fractionally less complex containers themselves. I say that securing the containers is less complex because one trick can reduce a container's attack vectors significantly – that is, running a container using the --read--only
option:
$ docker run -d --read-only chrisbinnie/my-web-server
As you might expect, this launches a container to which you cannot write in any way, shape, or form. In other words, if your container is attacked, the attacker can't write to the application. This is not always a popular way to use containers, although it's recognized as a quick and highly effective way of reducing their risk to other containers on a system and the host machine itself.
The effect of making a container read-only means, among other things, that when the container is stopped and restarted, the hack needs to take place all over again to be effective and can't be written into your application automatically. I think of these containers as Knoppix-style boot disks or ROM (Read-Only Memory) [10].
Consider that even opening a page in a web browser needs to write session data to your desktop machine. If you make a container read-only, then although you can read all the data you want from a container's processes, you need to provide other ways of writing data. You might try and save to the host itself, but a better method would likely be to write to a permanent or temporary storage device, depending on the type of data you're dealing with. For example, ephemeral session data is thrown away most of the time, but storing input from a user into an application that isn't written to a database might suit Amazon S3 [11] or sophisticated, redundant, off-host storage like Ceph like Ceph or the more cloud-friendly GlusterFS [12].
It's What's Inside that Counts
Security used to be much worse when it came to the internals of a container. Up until Docker v1.10, a host's root user also tied to the container's root user, which could cause all sorts of chaos, such as being able to load a kernel module dynamically into the kernel and do any damage to the host machine that you want. Thankfully, Docker has addressed this issue skillfully, but getting it to work requires overhead. The official line [13] from the Docker site is:
As of Docker 1.10 User Namespaces are supported directly by the docker daemon. This feature allows for the root user in a container to be mapped to a non uid-0 user outside the container, which can help to mitigate the risks of container breakout. This facility is available but not enabled by default.
It's highly recommended that you enable this functionality along with that of running "untouchable" containers that are read-only. A blog post [14] has information on how to map the root user's UID 0 to another UID and discusses how each tenant on a host can run their own range of UID and GID values without overlapping into the territories of others, thus causing other security concerns.
Mitigation Techniques
Moving away from doom and gloom, I'll now spend some time looking at how you can improve your security on Docker hosts and containers alike. You might be surprised at the number of additions you need to make to your Docker config to mitigate the many types of attacks. I will continue by touching on a few of them briefly to give you food for thought, in the hope that you can investigate further, because the list is extensive and a little daunting, especially when written in detail.
- Improve your host and container logging (e.g., with a centralized, alerting Syslog server).
- Run through the Center For Internet Security (CIS) hardening document for your daemon [15].
- Run Docker on a host by itself, and don't introduce further issues with other on-host applications; in other words, keep a stripped-down, minimal install of all other packages on the host.
- Run your Docker daemon from a Unix socket, as recommended in modern versions.
- Re-map your root UID, despite the sometimes time-consuming overhead.
- Lean on the
--read-only
option, and don't let anyone tell you otherwise; store your data off-host. - Never run privileged containers unless in development, because they give unmitigated root user access on the host itself by design.
- Limit CPU usage per container and define maximum RAM usage to limit attacks on a container affecting others.
- Use SELinux, if possible; otherwise, use AppArmor [16], grsecurity [17], or PaX [18] to lock down unexpected system resource access.
- Patch your systems more often than usual; even official Docker images can be riddled with known vulnerabilities.
- Don't run containers with
cap-add=ALL
; instead, shut all extended container capabilities down and then explicitly open them up. Listing 1 shows how to switch everything off and explicitly allow host access to a container to limit the host's exposure to exploits on the container. Table 1 lists capabilities that are not included by Docker by default but that can be enabled, and Table 2 lists capabilities that are enabled by default but that can be disabled [19]. - Limit container-to-container communications by using
--icc=false
. - Check your configuration with the extensible Docker Bench for Security tool [20].
Table 1
Capabilities Not Included by Default
Capability Key | Capability Description |
---|---|
SYS_MODULE
|
Load and unload kernel modules. |
SYS_RAWIO
|
Perform I/O port operations (iopl(2) and ioperm(2) ).
|
SYS_PACCT
|
Use acct(2) to switch process accounting on or off.
|
SYS_ADMIN
|
Perform a range of system administration operations. |
SYS_NICE
|
Raise the process nice value (nice(2) , setpriority(2) ) and change the nice value for arbitrary processes.
|
SYS_RESOURCE
|
Override resource limits. |
SYS_TIME
|
Set system clock (settimeofday(2) , stime(2) , adjtimex(2) ); set real-time (hardware) clock.
|
SYS_TTY_CONFIG
|
Use vhangup(2) to employ various privileged ioctl(2) operations on virtual terminals.
|
AUDIT_CONTROL
|
Enable and disable kernel auditing; change auditing filter rules; retrieve auditing status and filtering rules. |
MAC_OVERRIDE
|
Allow MAC configuration or state changes. Implemented for the Smack Linux Security Module (LSM). |
MAC_ADMIN
|
Override Mandatory Access Control (MAC). Implemented for the Smack LSM. |
NET_ADMIN
|
Perform various network-related operations. |
SYSLOG
|
Perform privileged syslog(2) operations.
|
DAC_READ_SEARCH
|
Bypass file read permission checks and directory read and execute permission checks. |
LINUX_IMMUTABLE
|
Set the FS_APPEND_FL and FS_IMMUTABLE_FL inode flags.
|
NET_BROADCAST
|
Make socket broadcasts and listen to multicasts. |
IPC_LOCK
|
Lock memory (mlock(2) , mlockall(2) , mmap(2) , shmctl(2) ).
|
IPC_OWNER
|
Bypass permission checks for operations on System V IPC objects. |
SYS_PTRACE
|
Trace arbitrary processes using ptrace(2) .
|
SYS_BOOT
|
Use reboot(2) and kexec_load(2) to reboot and load a new kernel for later execution.
|
LEASE
|
Establish leases on arbitrary files (see fcntl(2) ).
|
WAKE_ALARM
|
Trigger something that will wake up the system. |
BLOCK_SUSPEND
|
Employ features that can block system suspend. |
Table 2
Capabilities Enabled by Default
Capability Key | Capability Description |
---|---|
SETPCAP
|
Modify process capabilities. |
MKNOD
|
Create special files using mknod(2) .
|
AUDIT_WRITE
|
Write records to kernel auditing log. |
CHOWN
|
Make arbitrary changes to file UIDs and GIDs (see chown(2) ).
|
NET_RAW
|
Use RAW and PACKET sockets. |
DAC_OVERRIDE
|
Bypass file read, write, and execute permission checks. |
FOWNER
|
Bypass permission checks on operations that normally require the file system UID of the process to match the UID of the file. |
FSETID
|
Don't clear set-user-ID and set-group-ID permission bits when a file is modified. |
KILL
|
Bypass permission checks for sending signals. |
SETGID
|
Make arbitrary manipulations of process GIDs and supplementary GID list. |
SETUID
|
Make arbitrary manipulations of process UIDs. |
NET_BIND_SERVICE
|
Bind a socket to Internet domain privileged ports (port numbers <1024). |
SYS_CHROOT
|
Use chroot(2) , change root directory.
|
SETFCAP
|
Set file capabilities. |
Listing 1
Shutting down container capabilities
$ docker run -d --cap-drop=CHOWN --cap-drop=DAC_OVERRIDE --cap-drop=FSETID --cap-drop=FOWNER --cap-drop=KILL --cap-drop=MKNOD --cap-drop=NET_RAW --cap-drop=SETGID --cap-drop=SETUID --cap-drop=SETFCAP --cap-drop=SETPCAP --cap-drop=NET_BIND_SERVICE --cap-drop=SYS_CHROOT --cap-drop=AUDIT_WRITE chrisbinnie/my-web-server
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.