Capsicum – Additional seasoning for FreeBSD
Hot and Spicy
Administrators often break into a sweat when they read security bulletins explaining the malicious code that is currently in the wild for the programs they use. Web browsers, email programs, archiving tools, and even Office packages are affected. It is not just negligence in the use of libraries that makes it easy for intruders to execute malicious code, but also targeted attacks on vulnerable applications. The mechanisms known on FreeBSD (chroot or jails) are not really an answer.
One remedy is to lock applications up in a sandbox, an environment that provides only very limited resources. However, because FreeBSD up to version 8 does not provide such a mechanism, the Capsicum environment was created in FreeBSD 9. In addition to a protected environment (sandbox), from which applications can't break out, it supports finely granular allocation of rights.
Traditional Access Authorization
FreeBSD, Linux, and other Unix systems traditionally have had a very simple permissions system. To find the reason for this, you have to look back at early Unix systems, which were not originally designed for a desktop networked into the global Internet. This resulted in two main mechanisms of access control. The first mechanism is discretionary access control (DAC), which depends on the user ID. Here, the decision of whether a resource can be accessed is made solely on the basis of the user's identification. This means that, for each user, access rights to data are set by an administrator or by the user. The best example of this is a home directory, to which only the user has access.
The disadvantage of the method is demonstrated by the passwd
command for changing the user password. Because users can assign themselves a password or change their password in the user database, the passwd
command needs write access to the /etc/passwd
file. Only the root user has permission to change that file, however. Put simply, you use a trick and set the SUID flag for the passwd
command; the command is then executed with root privileges and the change to /etc/passwd
is carried out. Under certain circumstances, this mechanism can be misused as a gateway for malicious software.
The other mechanism for controlling access is mandatory access control (MAC). Here, access is granted on the basis of a set of rules. The disadvantage of this method is that such rules must be defined within the application, resulting in greater programming overhead. The programmer also bears full responsibility for assigning permissions.
These two types of access control are designed primarily to regulate unauthorized access to files. This approach does not prevent access to storage areas or even control structures of a kernel. Also, the mechanisms were never designed to cover modern desktop applications such as web browsers or office packages, which is critical when you consider that such applications process and display information originating from dubious sources. With DAC or MAC, the execution of malicious code in JavaScript or macro viruses can be difficult to prevent.
FreeBSD connoisseurs will point out that there are jails in which you have the option of building a sandbox. That is correct, but the administrative overhead and resource usage would be enormous if you created a jail for each application. Also, jails don't solve the problem of malicious code infiltrating a system.
Another possibility is that of breaking down an application into smaller processes that can be launched by the main process and equipping them with special access rights.
Figure 1 shows a safe environment for a web server based on the example of the Apache HTTP daemon. When Apache is started, the main process has all rights necessary to access the configuration files and the complete directory structure. Additionally, sockets are created that allow web browsers to retrieve web pages. After this basic configuration is complete, subprocesses that handle the actual task of the HTTP daemon are then started. Each subprocess is given permission to access the directory and resources allocated to it. This setup means that the process runs in a sandbox.
Programming an HTTP daemon hardened in this way involves considerable effort because access mechanisms must be specifically implemented for every Unix system or BSD operating system.
Chili Pepper
FreeBSD offers a solution for this problem in the form of Capsicum. FreeBSD serves as a reference platform here, not only for other BSD systems, but also for other Unix platforms. Capsicum in FreeBSD was implemented in the scope of the Google Summer of Code. Many kudos are owed to Pawel Jakub Dawidek (pjd) and his colleagues on the FreeBSD development team for their support and implementation of the project.
In the development of the Capsicum framework, the problems mentioned here were addressed, and new security features were introduced to harden applications. To fully exploit the benefits of Capsicum, you need to, in the worst case, either redevelop applications or at least restructure the code. Restructuring code is not necessarily a bad thing.
The focus of development of Capsicum was for existing access control mechanisms to remain functional without any changes. Additionally, the idea was for the application programming interface (API) to remain unchanged so that existing software would continue to work without any restrictions. Therefore, the Capsicum system extends the Unix programming interface by implementing its own functions within the operating system kernel.
To use Capsicum in your own applications and with operating system tools, you can build on the C header files (sys/capability.h
, libcapsicum.h
) and the libcapsicum
library, which communicates with the kernel extensions.
To understand Capsicum, some non-trivial basics need to be explained. Capsicum supports what is known as capability mode, which is a flag set by the cap_enter()
function. It indicates that all file and storage operations are now highly regulated. This flag is inherited by all child processes and cannot be deleted.
Processes that are in the capability mode only have very limited access to the kernel namespace (Table 1). Additionally, some system interfaces are protected. This includes all device drivers that allow access to the physical memory or PCI bus. Also, commands such as reboot
or kldload
can be blocked.
Table 1
Global Namespace of the FreeBSD Kernel
Namespace | Explanation |
---|---|
Process ID (PID) | Unix processes are represented by unique identifiers. PIDs are returned at the start of a process and can be used for debugging, to send signals, for monitoring, and to determine the current state. |
File paths | Unix files exist in a global, hierarchical namespace that is protected by DAC and MAC. |
NFS file handles | Both NFS clients and NFS servers use file handles to identify files and directories. NFS access management also relies on these. |
Filesystem IDs | These determine the mapping of mountpoints to paths and are used to perform a forced unmount if a path no longer exists. |
Protocol addresses | The protocol families use socket addresses to refer to local or remote network endpoints. They exist in the global namespace, as do IPv4 addresses and ports or sockets. |
Sysctl MIBs | The sysctl management system users both numeric and alphanumeric entries to read or change system parameters. |
System V IPC | Message queues, semaphores, and shared memory are used for interprocess communication and are handled according to the System V standard. |
POSIX IPC | Message queues, semaphores, and shared memory are used for interprocess communication and are handled according to the POSIX standard |
System clocks | FreeBSD systems provide several interfaces for managing the system clock. |
Jail | Jails as FreeBSD-based virtualization use their own namespace as a subset of the global namespace. |
CPU sets | Assignments between CPU resources and processes and threads. |
Calls to system functions are also regulated in capability mode. Some features that have access to the global namespace are no longer available, whereas others have limited access. An example is the sysctl
command and its counterpart systctl()
in the libc
programming library: With this command, you can query memory allocation, sniff network connections, or modify kernel parameters, and it can provide potential attackers a vector for an attack or monitoring. To increase the security, access was restricted to just 30 parameters – compared with the 3,000 parameters that sysctl()
offers. Simply by enabling capability mode, you create a sandbox from which applications cannot break out.
In addition to capability mode, Capsicum also introduces finely granular permissions without abandoning the previous system of permissions (Figure 2). This trick was possible because the developers expanded the structure of the file descriptor. A file descriptor is a system-wide unique serial number that points to a data structure. This data structure – also known as metadata – includes permissions as well as the file name. The most famous file descriptors are STDIN (standard input), STDOUT (standard output), and STDERR (standard output for error messages).
The previously used file descriptors already contain the FreeBSD permissions. These are immutable characteristics that can be inherited by child processes. However, in terms of security, their disadvantage is that they allow manipulation of metadata, even if a file or a device has been opened for exclusive read or write operation.
At this point, Capsicum extends the data structure associated with the file descriptor. Once cap_enter()
is called by an application, all file descriptors use the extended data structure. As soon as this kind of file descriptor is used, the kernel checks to see whether everything is correct when accessing the hardened unit.
For developers of applications that use the Capsicum system, this step is important, because you have to decide whether to allow access that is already blocked by cap_enter()
, be even more restrictive, or add even more rules. This is done by calling cap_new()
, which expects an existing file descriptor and the permissions that you want to set as parameters. It doesn't matter whether the file descriptor was created for files, Unix or network sockets, directories, or devices. The man page for cap_new()
lists all the available permissions, which are OR'd and then passed to Capsicum. The man page also lists numerous system functions of the libc
C library that are affected by Capsicum.
Capsicum therefore requires that you plan your applications carefully. This task is certainly not trivial, because it requires very precise analysis of the resources, including the use of protected shared memory instead of a publicly accessible shared memory area for exchanging data. Capsicum gives the programmer the freedom of choice to use FreeBSD's permission system or the libcapsicum
library.
Hot tcpdump
Applications with dubious privileges can be revamped so that they use cap_enter()
directly. This approach creates an application whose individual processes run in capability mode and inherit special permissions via their file descriptors. It works well for simple applications that operate on the basis of the following schema: Open all resources and process all incoming and outgoing data in a loop
– like a Unix pipeline or through interaction with a network. The speed hit from Capsicum is very low if you restrict permissions when accessing the resources.
On the basis of the FreeBSD network analysis tool tcpdump
, this objective is described in detail below. Tcpdump is built in line with the schema I just mentioned and therefore is easy to convert to Capsicum: The program uses the Berkeley packet filter bpf
to analyze the data transported over a network. To do so, tcpdump passes a search pattern to the packet filter. In the next step, the filter is defined as an input source to send the information to tcpdump for further processing. Finally, the incoming data are interpreted, reprocessed, and displayed on the console in a loop.
Thus, the application can be switched to Capsicum capability mode with just two additional lines of program code:
if (cap_enter() < 0) error("cap_enter: %s",pcap_strerror(errno));
The following two lines are inserted in front of the loop that carries out the traffic analysis:
status = pcap_loop(pd, cnt, callback, pcap_userdata);
This approach improves security considerably. The ability to parse and analyze data packets is typically a vulnerability because memory access is often handled by C pointers and copy actions. As explained above, Capsicum prevents access to privileged memory areas by calling cap_enter()
.
To restrict communication with standard devices (STDIN, STDOUT, and STDERR) as well, you need to insert Listing 1 before the first call of cap_enter()
.
Listing 1
Restricting Standard Channels
if (cap_rights_limit(STDIN_FILENO, CAP_FSTAT) < 0) error("cap_new: unable to limit STDIN_FILENO"); if (cap_rights_limit(STDOUT_FILENO, CAP_FSTAT | CAP_SEEK | CAP_WRITE) < 0) error("cap_new: unable to limit STDOUT_FILENO"); if (cap_rights_limit(STDERR_FILENO, CAP_FSTAT | CAP_SEEK | CAP_WRITE) < 0) error("cap_new: unable to limit STDERR_FILENO");
With the cap_rights_limit()
function used here, read access to the STDIN device is prevented, whereas write operations are allowed to the standard output devices, STDOUT and STDERR.
Analysis with the FreeBSD command procstat
using the -C
parameter confirms these facts, as shown in Figure 3. In the first and second columns, you can see the process ID and the name of the process; the third column shows the file descriptor. In this example, these are standard input (FD=0), standard output (FD=1), standard error output (FD=2), and the bpf
driver for the Berkeley packet filter (FD=3). The fourth column describes the type of file descriptor, and the FLAGS
column shows what FreeBSD permissions are set. The letter c
shows that Capsicum is active for this file descriptor. The CAPABILITIES
column indicates which Capsicum permissions are used. Specifying FS
(CAP_FSTAT
) means that the status of the file descriptor can be queried; wr
(CAP_WRITE
) stands for write permission, and se
(CAP_SEEK
) means that the file pointer can be set. An overview of all Capsicum permissions can be found online [2]. The last two columns show the log and the device driver used for each file descriptor.
Using Capsicum does cause a clearly visible nasty side effect, especially with tcpdump: Access to the name service switch is blocked. In the case of tcpdump, this information is needed to convert IP addresses into fully qualified hostnames.You can work around this shortcoming by sending requests to a local domain name server.