Darshan I/O analysis for Deep Learning frameworks

Looking and Seeing

The Darshan [1] userspace tool is often used for I/O profiling of HPC applications. It is broken into two parts: The first part, darshan-runtime , gathers, compresses, and stores the data. The second part, darshan-util , postprocesses the data.

Darshan gathers its data either by compile-time wrappers or dynamic library preloading. For message passing interface (MPI) applications, you can use the provided wrappers (Perl scripts) to create an instrumented binary. Darshan uses the MPI profiling interface of MPI applications for gathering information about I/O patterns. It does this by "… injecting additional libraries and options into the linker command line to intercept relevant I/O calls" [2] (section 5.1).

For MPI applications, you can also profile pre-compiled binaries. It uses the LD_PRELOAD environment variable to point to the Darshan shared library. This approach allows you to run uninstrumented binaries for which you don't have the source code (perhaps independent software vendor applications) or applications for which you don't want to rebuild the binary.

For non-MPI applications you have to use the LD_PRELOAD environment variable and the Darshan shared library.

Deep Learning (DL) frameworks such as TensorFlow are becoming an increasingly big part of HPC workloads. Because one of the tenets of DL is using as much data as possible, understanding the I/O patterns of these applications is important. Terabyte datasets are quite common. In this article, I take Darshan, a tool based on HPC and MPI, and use it to examine the I/O pattern of TensorFlow on a small problem – one that I can run on my home workstation.