A ptrace-based tracing mechanism for syscalls
Hidden Treasures
Scientific applications can be limited by file I/O because of their distributed nature and implementation of storage. The libiotrace
library, extended with ptrace
-based syscall tracing, lets users and developers analyze file-I/O-based bottlenecks.
LD_PRELOAD
Dynamic profiling analyzes a program during runtime, thereby gathering information about resource utilization. Often, you want to know which part of a program causes the highest CPU utilization or which task uses how much memory. This information can be used either to optimize the program or to manage a running system and prevent bottlenecks. You can find a multitude of tools for gathering such data.
If the source code of the program is available, you can use a debugger or an instrumentation framework to gather resource usage information. Most compilers offer an instrumenting framework that allows you to insert profiling functionality during compile time. For example the GNU compiler collection (GCC) offers support for gprof
[1], which can be used to analyze the time spent in each part of a program.
If you don't have the source code or don't want to recompile the program in question, you can use the LD_PRELOAD
environment variable. Launching a program causes the executable to be loaded into memory and executed as a new process. All the dynamic libraries (so-called shared objects) on which the program depends are loaded into the process. Once all libraries are part of the process memory, the linker collects all exposed functions from the libraries and links them against function calls. If a function is provided by more than one library, the first match is used for linking.
The environment variable LD_PRELOAD
allows you to load and link a library before any other library is loaded, so you can use it to get the program to call your own provided implementation of a library function. The libiotrace
implementation of the function gathers information (e.g., execution time of the function and function parameters) and writes the collected data to a buffer. It also calls the function that would have been called without LD_PRELOAD
by resolving the address of the function with dlsym
, which then returns the second match from all loaded libraries. The set of tools in gperftools
[2] uses LD_PRELOAD
, as well.
libiotrace: Just Another Profiling Tool?
A typical CPU computes data faster than data can be fetched from or stored to main memory (the so-called memory wall) [3]. Storage only exacerbates this problem. File I/O can therefore be a highly relevant factor for program optimization. The libiotrace
[4] library uses LD_PRELOAD
to gather data about POSIX [5] and MPI [6] file I/O functions. Although other tools, such as Darshan [7], use this method too, libiotrace
adds live tracing support, as opposed to the pure postmortem analysis of most tools.
A typical tracing setup for libiotrace
is shown in Figure 1, which illustrates how LD_PRELOAD
is used to "insert" libiotrace
between the profiled program and libc
.
Typically, the data is either written to a logfile, sent to an InfluxDB instance, or both. InfluxDB serves as a data source for Grafana for near real-time visualization.
Supplemental Profiling with ptrace
During work on libiotrace
, a few instances of file I/O couldn't be traced by wrapping the file I/O functions with LD_PRELOAD
. This file I/O uses function calls that are obviously not dynamically linked against the libiotrace
wrappers during the start of the executable. Tracing this file I/O required further investigation.
Running the program with strace
revealed that the file I/O in question does in fact call the POSIX function open64
. The same behavior could be observed in a debugger. A closer look at the function call in the stack trace of the debugger showed the root cause: The function call wasn't going through the procedure linkage table (PLT), which is generated during compile time to enable relocations of function addresses. Each entry in the PLT is a stub function that is called instead of the function itself (located in a shared object).
During runtime, the dynamic linker searches the function address in the shared object and provides it to the stub function in the PLT. The stub function then uses this address to call the function in the shared object file. A PLT entry is thus a layer of indirection used to relocate function calls in a single place per loaded object. A disassembly of the used library proved that open64
had no PLT entry in this library. An example of this situation can be found in the implementation of dlopen
in libdl
(the linker itself).
The dlopen
function can be used by a program to load a dynamic library. In fact, this function is used by the linker itself to load and link shared object files. To open and read a shared object file, dlopen
calls an implementation of open
or open64
(on a 64-bit system), which are both part of the libc
library. A call to dlopen
can happen before the dynamically linked libc
is available (e.g., during the loading process of libc
).
Therefore, a relocation of open
or open64
during a call of dlopen
is not feasible, so the dlopen
implementation ships with its own statically linked version of the open
functions. On further research, even more libraries with statically linked versions of POSIX file I/O functions were found (e.g., some versions of the libpthread
library).
To make matters worse, other ways can prevent linking against a function loaded by LD_PRELOAD
. For example, the linker options -Bsymbolic
and -Bsymbolic-functions
can ensure that a call to a function will use a local function inside a shared library and not a function exposed from a different object. Furthermore, flags for dlopen
(RTLD_NOW
and RTLD_DEEPBIND
) can change the order in which dynamic libraries are searched for functions to link.
In conclusion, LD_PRELOAD
is not sufficient if you want to profile all file I/O. Enter syscall tracing.
Buy this article as PDF
(incl. VAT)