Baselines are more important than the benchmark
Witness Mark
"It is too slow": These four words launch nearly every performance analysis. Slow is an ambiguous term, which may equally represent very different concerns, including change in performance compared with a previous versions or how the same software ran on some previous day. Equally, it may represent inadequate performance compared with what performance a system is believed to be capable of delivering.
I have examined this second definition before [1], studying how the library of CPU benchmarks published by the 7-Zip project [2] could be used to compare with observed CPU performance on my system. This month, I explore how to define baselines for I/O, including the storage and network subsystems. The I/O paths are more vulnerable to performance setbacks because of system or software misconfiguration than CPU or memory, having to rely on multiple components to perform optimally – or in the case of the network, external hops through the Internet itself.
The Network Is the Computer
Setting a baseline is essentially figuring out what is the highest performance that can be expected if the system on hand is properly configured. The simplest tool reproducing the tested configuration is always to be preferred, to limit the multitude of variables under consideration. Usually, the application the system is meant for is a much richer and more complex beast, and if you are testing for performance, usually the application has already shown undesirable behavior anyway.
The iperf3
tool [3] is your go-to utility to test a network path's baseline, end to end. Found in the Ubuntu Universe repository (install with apt install iperf3
) and in macOS's Brew (brew install iperf3
), it measures the effective bit rate between two systems (Figure 1) by taking multiple samples and averaging the results. By default, iperf3
uploads from client to server, but the reverse is also possible (-R
option), sending data in the opposite direction to validate asymmetric network paths (see the "Hot-Potato Routing" box). Here, I have set up a server in the cloud (Figure 2) running Ubuntu Focal 20.04: The client is running in a terminal on my MacBook as I write. Ports to Microsoft Windows also exist [5], and testing UDP instead of the default TCP is an option, as well.
Hot-Potato Routing
Peering agreements between large networks may not specify traffic cost settlement or prescribe specific routes, granting operators the freedom to define paths between their networks. In such cases, "hot-potato routing" [4] often results, with a network handing over a packet to its peer network at the closest available peering point, to minimize network load and costs. When the peer network adopts the same practice, different routes may be in use for the two directions of a given network connection, with asymmetric paths through the Internet being not at all uncommon. Although not normally a concern and usually invisible to the end user, it is important to keep in mind while testing network connections.
Secure Testing
SSL/TLS connections are another common case of networking baseline because they place a significant compute load on the remote endpoint and because of the complexity of the server architectures involved. Often spanning multiple servers on the remote end, an SSL benchmark is the ultimate test of what takes place in practice, as opposed to what theory predicts. The OpenSSL project incorporates two useful benchmarks: openssl speed
[6], which tests cryptographic performance in abstract on the local machine without any network access, and the s_time
command [7], which performs an all-encompassing, end-to-end test. The s_time
benchmark can be used to examine a server's capacity to handle connections in a given time span and how long it took to serve secured content.
Figure 3 shows as example test with Google's search engine as the endpoint, historically one of the Internet's snappiest endpoints. Results are provided for new sessions as well as cached session IDs, which performed significantly better. Another consideration that needs to be kept in mind is that the test is against Google as a whole, their primary domain running in an obvious round-robin setup (see the "Round-Robin DNS" box), and further schemes such as load balancers or Anycast routing [8] are probably in use.
Round-Robin DNS
In a round-robin DNS setup, multiple addresses are offered as destinations for the same domain name. For example, today at my office in Westford www.google.com resolves to the following six different IP addresses:
``` $ nslookup www.google.com Server: 10.192.206.245 Address: 10.192.206.245#53 ** Non-authoritative answer: Name: www.google.com Address: 142.251.111.106 Name: www.google.com Address: 142.251.111.104 Name: www.google.com Address: 142.251.111.103 Name: www.google.com Address: 142.251.111.105 Name: www.google.com Address: 142.251.111.147 Name: www.google.com Address: 142.251.111.99
The ordering of these answers will differ from query to query, and the DNS client will pick one address (likely the first) and connect to it. Whether repeated connections to the same domain name result in connections to the same IP is implementation dependent, something a clever performance engineer may well choose to disambiguate explicitly with an IP address, depending on the ultimate objective of the test.
The Storage with the Most
I have covered a lot of I/O testing tools in this column over the years, and many can be used to define a baseline, which is a technique, rather than a tool. I/O systems are noisy, and benchmarking them can be exceedingly difficult. In some tests, it may be beneficial to eliminate storage altogether and perform the test directly against a RAM disk. Provisioning a half-gigabyte RAM disk is trivially simple in Linux:
# modprobe brd rd_nr=1 rd_size=524288 # ls /dev/ram0 /dev/ram0
You can use this approach to evaluate the performance effect of encrypted partitions without having to worry about the noise of the underlying storage medium, simply comparing the performance of RAM disk access, with and without making use of encryption. A similar, complementary technique is writing the file directly to disk, without the intervening filesystem layer affecting measurements, as I have demonstrated previously [9]. To test encryption without a filesystem, you have to use a detached header to store encryption keys, lest your benchmark accidentally overwrite them because they are stored on the same drive by default. Listing 1 details the setup process, resulting in /dev/ram0
directly accessing the RAM disk, whereas /dev/mapper/encrypted-ram0
is first encrypted by LUKS [10] before storing to the same memory. Listing 2 then shows the simplest possible benchmark, with dd
[11] to compare block performance in both modes. The raw device performs three times as fast as encrypted access to the same storage. The critical finding is that encryption will not be the performance bottleneck in this setup, as long as the storage medium is not capable of more than 210 MB/s of sustained throughput.
Listing 1
LUKS Encryption Overlay Set-up
¤¤nonumber root@focal:~# # Allocate a half-GB RAM disk root@focal:~# sudo modprobe brd rd_nr=1 rd_size=524288 root@focal:~# ls /dev/ram0 /dev/ram0 root@focal:~# fallocate -l 2M header.img root@focal:~# echo -n "not a secure passphrase" | cryptsetup luksFormat -q /dev/ram0 --header header.img - root@focal:~# # Open ram0 as an encrypted device root@focal:~# echo -n "not a secure passphrase" | cryptsetup open --header header.img /dev/ram0 encrypted-ram0 root@focal:~# ls /dev/mapper/encrypted-ram0 /dev/mapper/encrypted-ram0
Listing 2
Encrypted vs. Plain Text
¤¤nonumber root@focal:~# dd if=/dev/zero of=/dev/ram0 bs=4k count=100k 102400+0 records in 102400+0 records out 419430400 bytes (419 MB, 400 MiB) copied, 0.535233 s, 784 MB/s root@focal:~# dd if=/dev/zero of=/dev/mapper/encrypted-ram0 bs=4k count=100k 102400+0 records in 102400+0 records out 419430400 bytes (419 MB, 400 MiB) copied, 1.99686 s, 210 MB/s
Infos
- "Data Compression as a CPU Benchmark" by Federico Lucifredi, ADMIN , issue 66, 2021, pg. 94, https://www.admin-magazine.com/Archive/2021/66/Data-Compression-as-a-CPU-Benchmark
- 7-Zip LZMA CPU benchmark library: https://www.7-cpu.com/
- iperf3(1) man page: https://manpages.ubuntu.com/manpages/focal/en/man1/iperf3.1.html
- Hot-potato routing: https://en.wikipedia.org/wiki/Hot-potato_and_cold-potato_routing#Hot-potato_routing
- iperf for Windows: http://www.iperfwindows.com/index.html
- openssl speed(1) man page: https://manpages.ubuntu.com/manpages/focal/man1/speed.1ssl.html
- openssl s_time(1) man page: https://manpages.ubuntu.com/manpages/focal/en/man1/s_time.1ssl.html
- Anycast routing: https://en.wikipedia.org/wiki/Anycast
- "Testing the Samsung MU-PA500B 500GB SSD" by Federico Lucifredi, ADMIN , issue 47, 2018, pg. 92, https://www.admin-magazine.com/Archive/2018/47/Testing-the-Samsung-MU-PA500B-500GB-SSD
- LUKS: https://gitlab.com/cryptsetup/cryptsetup/blob/master/README.md
- dd(1) man page: https://manpages.ubuntu.com/manpages/focal/en/man1/dd.1.html
Buy this article as PDF
(incl. VAT)