Managing Linux Memory
Memory Hogs
Your Own Application
One very obvious measure for reducing storage problems is to keep your application small and always release memory. However, this practice usually requires complex and singular planning. From a developer's point of view, it may seem tempting to prevent your application pages from being swapped out of physical memory. Certain methods require intervention with the code, and others are transparent from the application's perspective and require just a little administrative work.
The well-known mlock()
[6] system call belongs to the first group. This command lets you explicitly pin down an area from the virtual address space in physical memory. Swapping is then no longer possible – even in potentially justified, exceptional cases. Additionally, this approach requires special attention, especially for cross-platform programming, because mlock()
makes special demands on the alignment of data areas. This, and the overhead of explicit intervention with the program code, make mlock()
unsuitable in many environments.
Another approach, which also requires intervention with the code and system administration, relies on huge pages, which the Linux kernel has in principle supported since 2003 [7]. The basic idea behind huge pages is simple: Instead of many small pages (typically 4KB), the operating system uses fewer but larger pages – for example, 2MB on x86_64 platforms.
This approach reduces the number of page to page frame mappings, thus potentially accelerating memory access noticeably. Because of the way they are implemented, Linux cannot swap out huge pages, so they also help to protect your own pages. Unfortunately, however, the use of huge pages is cumbersome. For example, to provide shared memory, you need to set up a separate hugetlbfs
filesystem. Shared memory areas are stored here as files that are implemented as huge pages in main memory.
The greater problem is that you cannot guarantee retroactive allocation of huge pages on the fly, because they need to be mapped in contiguous memory areas. In fact, you often have to reserve huge pages directly after boot, which means that the memory associated with them is not available to other applications without huge page support. This comes at a price of less flexibility, especially for high-availability solutions such as failover scenarios but also in general operation. The complexities of huge page management prompted the development of a simpler and largely automated method for the use of larger pages.
This method made its way into the 2.6 kernel in 2011, in the form of Transparent Huge Pages (THP). As the name suggests, the use of THP is invisible to users and developers alike. The implementations generally show slightly slower performance than huge pages. However, THPs are stored in the same way as normal pages.
Efficient: mmap()
Another approach relating to your in-house application is not actually a safeguard in the strictest sense but is a strategy for efficient memory usage, and thus indirectly for reducing the need to swap. The mechanism is based on the well-known mmap()
call [8] [9]. This system call maps areas from files in the virtual address space of the caller, so that it can manipulate file content directly in the virtual address space, for example.
In some ways, mmap()
replaces normal file operations such as read()
. As Figure 3 shows, its implementation in Linux differs in one important detail: For read()
and the like, the operating system stores the data it reads in an application page and again as a data file in the page cache. The mmap()
call does not use this duplication. Linux only stores the page in the page cache and modifies the page to page frame mappings accordingly.
In other words, the use of mmap()
saves space in physical memory, thereby reducing the likelihood that your own pages are swapped out – at least in principle and to a certain, typically small, extent.
None of the approaches presented here for making your application swap-resistant is completely satisfactory. Another approach could therefore be to change the behavior of other applications to better protect your own app. Linux has for a long time offered the ability to restrict an application's resource consumption using setrlimit()
and similar system calls. However, these calls presumably require intervention in third-party code, which is obviously not a viable option.
Everything Under Control
A better alternative at first glance could be control groups (cgroups) [10], which have been around since Linux 2.6. With their help, along with sufficient privileges, you can also allocate a third-party process to a group, which then controls the process's use of resources. Unfortunately, this gives system administrators difficult questions to answer, as is the case with restricting resources in the shell via ulimit
:
- Can you really restrict an application at the expense of others? Assuming a reasonable distribution of applications across machines, this prioritization is often difficult to get right.
- What are reasonable values for the restrictions?
- Finally, you also can critically question the fundamental orientation: Are fixed restrictions for third-party applications really what you are trying to achieve? Or should the Linux kernel only handle your own application with more care in special cases?
- Is a termination of the application tolerable on reaching the limits?
These questions show that cgroups ultimately do not solve the underlying problem.
The application buffer is swapped out because the page cache takes up too much space for itself. One possible approach is therefore to avoid letting the page cache grow to the extent that it competes with application memory. With the support of applications, you can do this to a certain extent.
Developers can tell the operating system in their application code not to use the page cache, contrary to its usual habits. This is commonly referred to in Linux as direct I/O [11]. It works around operating system caching, for example, to give a database full control over the behavior of its own caches. Direct I/O can be initiated via options for file operations using the O_DIRECT
option for open()
or DONT_NEED
with fadvise()
, for example.
Unfortunately, very few applications use direct I/O, and because a single misconfigured application can unbalance a stable system, such strategies can at best delay swapping out of the application buffer.
Buy this article as PDF
(incl. VAT)