Favorite benchmarking tools
High Definition
At the Dragon Propulsion Laboratory, we are partial to using the simplest tool that will do the job at hand – particularly when dealing with the inherent complexity that performance measurement (and tuning) brings to the table. Yet that same complexity often requires advanced tooling to resolve the riddles posed by performance questions. I will examine my current benchmarking tool favorites from the simplest to the more sophisticated.
Tempus Fugit
The benchmark archetype is time
: simple, easy to use, and well understood by most users. In its purest form, time
takes a command as a parameter and times its execution in the real world (real), as well as how much CPU time was allocated in user and kernel (sys) modes:
$ time sleep 1 real 0m1.004s user 0m0.002s sys 0m0.001s
What not everyone knows is that the default time
command is actually one of the bash-builtins
[1]:
$ type time time is a shell keyword $ which time /usr/bin/time
There is time
, and then there is GNU time
[2]. The standalone binary version sports a few additional capabilities, the most noteworthy being its ability to measure page faults and swapping activity by the tested binary:
$ /usr/bin/time gcc test.c -o test 0.03user 0.01system 0:00.05elapsed 98%CPU (0avgtext+0avgdata 20564maxresident)k 0inputs+40outputs (0major+4475minor)pagefaults 0swaps
The sixth Dojo was dedicated to GNU time's amazing capabilities, and I invite you to read up in your prized archive of ADMIN back issues [3]. Table 1 sums up the capabilities of this versatile tool, which include memory use, basic network benchmarks (packet counts), filesystem I/O operations (read and write counts), and page faults, both minor (MMU) and major (disk access).
Table 1
Format Specifiers*
Option | Function |
---|---|
C
|
Image name and command-line arguments (argv )
|
D
|
Average size of the process's unshared data area (KB) |
E
|
Wall clock time used by the process ([hours:]minutes:seconds) |
F
|
Number of major page faults (requiring I/O) incurred |
I
|
Number of filesystem inputs |
K
|
Average total (data + stack + text) memory use of the process (KB) |
M
|
Maximum resident set size of the process (KB) |
O
|
Number of filesystem outputs by the process |
P
|
Percentage of the CPU that this job received (user + system divided by running time) |
R
|
Number of minor page faults (not requiring I/O, resolved in RAM) |
S
|
Kernel-mode CPU seconds allocated to the process |
U
|
User-mode CPU seconds allocated to the process |
W
|
Number of times the process was swapped out of main memory |
X
|
Average amount of shared text in the process (KB) |
Z
|
System's page size (bytes) – this is a system constant |
c
|
Number of times the process was context-switched involuntarily (time slice expired) |
e
|
Wall clock time used by the process (seconds) |
k
|
Number of signals delivered to the process |
p
|
Average unshared stack size of the process |
r
|
Number of socket messages received |
s
|
Number of socket messages sent |
t
|
Average resident set size of the process (KB) |
w
|
Number of times that the program was context-switched voluntarily |
x
|
Exit status of the command |
* Available in GNU time, version 1.7 (adapted from the man page). |
Going Deeper
Back in issue 12, I mentioned Martin Pool's promising tool judge
[4]. Unfortunately, judge
never made it past version 0.1, with its most recent release dated back to 2011. However, Martin's efforts have an impressive successor in David Peter's recent hyperfine
[5], which is a step up from time
when timing a run, because it runs the task a number of times and generates relevant statistics.
Remarkably, the tool takes care of determining how many runs are necessary to generate a statistically valid result automatically on its own recognizance. Normally, it performs 10 runs of the specified command by default, but the minimum number of runs can be tuned manually (-m
). Returning again to the same sleep
[6] example I used previously, I show a snapshot of the interactive progress report in Figure 1, followed by the final results of the run in Figure 2.
In this test, hyperfine
sampled 10 runs of a simple delay command, determined an expected value (mean) 1.6ms greater than requested with a standard deviation (sigma) of 0.3ms [7], and reported the range between the minimum and maximum values encountered – handling insubstantial results as well as the appearance of significant outliers. Incongruous data will generate warnings like the following:
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs.
Very short execution times will lead to warnings requesting a better test and will likely inflate the execution count: Try hyperfine 'ls'
to see this happen in practice. The program can be dominated by startup time, or the hot-cache case might not be what you want to measure.
The tool can account for cache warmup runs (--warmup N
), and if the opposite behavior is desired, it is just as easy to clear caches before every run by passing the appropriate command (--prepare COMMAND
). Exporting the timing results for all runs is also conveniently supported in JSON, CSV, and even Markdown.
Bench Test
The bench
[8] command also is geared toward statistical analysis of multiple sampling runs, but it provides much less feedback as measurements are being taken. Once samples are obtained, however, the analysis of the results is comprehensive (Figure 3). As with judge
, multiple commands can be compared in a single run, but the pièce de résistance of bench
is really its ability to generate beautiful charts from the data. Figure 4 makes the case for using bench
plain, displaying the report generated by the test in Figure 5. Everything is in one place and ready to share with others on your team.
Lack of pre-built packages in the major Linux distributions and the broken builds for macOS [9] found in brew
[10] make installing bench
less than the ideal experience, but its usefulness more than makes up for the minor inconvenience of having to use Haskell's stack
build system [11] for setup.
Infos
- bash-builtins (7) man page: https://manpages.ubuntu.com/manpages/bionic/en/man7/bash-builtins.7.html#see%20also
- time (1) man page: https://manpages.ubuntu.com/manpages/bionic/en/man1/time.1.html
- "Time Out" by Federico Lucifredi, ADMIN , issue 12, 2012, pg. 96
- judge: http://judge.readthedocs.org/en/latest/
- hyperfine: https://github.com/sharkdp/hyperfine
- sleep (1) man page: https://manpages.ubuntu.com/manpages/bionic/en/man1/sleep.1.html
- Standard deviation: https://en.wikipedia.org/wiki/Standard_deviation
- bench: https://github.com/Gabriel439/bench
- bench GitHub issue 12: https://github.com/Gabriel439/bench/issues/12
- Homebrew: https://brew.sh/
- The Haskell tool stack: https://docs.haskellstack.org/en/stable/README/
Buy this article as PDF
(incl. VAT)