Network performance in the world's favorite data center
Head in the Cloud
In the Performance Tuning Dojo article two issues ago [1], I determined the importance of baselines to every performance analysis. This month, I measure a baseline on the cloud and confirm it meets my expectations – and discuss what to do in light of unexpected surprises. Network performance is a complex problem with many variables. To make the problem tractable, you start by looking at specific scenarios, as opposed to the entire problem space in abstract. What does network performance look like in AWS Elastic Compute Cloud (EC2) when using familiar tools?
Amazon's Network
At its simplest, networking in AWS is straightforward: Launching an instance with the default networking configuration will give you an instance with a valid IP address. At a minimum, an AWS instance has one network device attached. The maximum number of network devices that can be attached depends on the instance type, but it is safe to assume that current instance types (except T2 and M3) all support enhanced networking for Linux. That is where the easy part ends, with available instance configurations reaching to 170Gbps with a single card (and 200Gbps with two cards or the Elastic Fabric Adapter) [2]. Verifying that the expected configuration throughput matches actual wire speed is where benchmarking comes in.
Following the advice in Table 1, launch two m6in.2xlarge
instances located in the same AWS availability zone for testing (see the "AWS Testing Policy" box). Rated at "up to 40 Gbps" by AWS [6] and equipped with an Elastic Network Adapter (ENA), they should perform well with any operating system providing appropriate hardware support [7]. I am using Ubuntu 22.04 (Jammy) for this test in the original us-east-1
availability zone.
$ aws ec2 run-instances --image-id ami-0ea1c7db66fee3098 --region us-east-1 --key federico --instance-type m6in.2xlarge --count 2 --output text
Table 1
Network Performance Factors*
Enhanced networking [4] | SR-IOV [5] and better hardware | |
---|---|---|
Elastic network adapter (ENA) | ||
Path MTU | Enable jumbo frames | Improve throughput, not latency |
Direct connection | No intervening gateways | |
Physical distance | Same availability zone | The closer the faster |
Same region | ||
Same continent | ||
Instance type | Bigger is better (CPU bound) | |
Credit-IO instances (like T4g) vary performance | ||
Multithreaded application | ||
Elastic Fabric Adapter | ||
* Major cloud network performance factors, roughly in order of precedence. |
AWS Testing Policy
Network performance testing on EC2 may inadvertently resemble a distributed denial of service (DDoS) attack. Although the differences between a benchmark and DDoS attack are clear, their similarity lies in the attempt to reach a target node's limits with external traffic. Nothing in this two-node testing should cause AWS to get upset with you, but when performing testing with a fleet of client nodes, it is worth remembering to review their testing policy [3] and determine whether advance notice of your plans is in order. AWS may have suggestions for your testing plan and will be standing by to intervene during your test should the availability zone's network be negatively affected.
The results are shown in Figure 1. Here I used AWS's CloudShell service, which provides convenient access to the client-side command-line interface (CLI) in a pre-configured environment hosted in a web browser terminal.
Dirty Hands
Logging in to the AWS Console, you should double-check first that ENA is correctly enabled (Figure 2). At the instance's shell, you can verify the same by checking that the corresponding driver is loaded (Figure 3):
$ modinfo ena
The same could have been ascertained through the AWS CLI with a clever query (Figure 4). You don't actually need to repeat this check – pick the interface style that most suits you.
Onward and Upward
After updating the package caches (sudo apt update
), you can install iperf3
[8], the standard tool to test a network path's performance end to end. Found in the Ubuntu Universe repository (install it with sudo apt install iperf3
), it measures the effective bit rate between two systems by taking multiple samples and averaging the results. The iperf3
tool defaults to uploads from client to server, but the reverse is also possible (-R
option) for bi-directional testing. Launch the server first:
$ iperf3 -s ------------------------- Server listening on 5201 -------------------------
Note that the server will listen on all the IP addresses of the instance, which will come in handy later. The results are lackluster; you are paying for 40Gbps connectivity, yet the benchmark is showing only 4.7Gbps were achieved (Figure 5)! What is going on?
Inspecting the connection with tracepath
[9] shows that although the client is configured with the "jumbo frame" maximum transmission unit (MTU) of 9001 bytes [10], an intervening node is providing the standard Ethernet MTU of 1500 bytes (Figure 6). The explanation for this is straightforward: You used public IP addressing for your test, and 5Gbps is the highest bandwidth the gateway node of the availability zone (AZ) can supply. Switching the test to the private address of your instances (remember that the two instances are on the same AZ), you can achieve a more respectable 9.5Gbps (Figure 7). As you are now hitting the limits of a single network socket, your next test ramps up to 20 parallel connections:
$ iperf3 -c 172.31.20.174 -i 2 -P 20
The log is somewhat unwieldy, but the final tally reached 34Gbps (Figure 8). The final piece of the puzzle is unexpected: iperf3
is a single-threaded process; thus, despite your instances sporting eight cores each, the network performance of the test is becoming CPU-bound. The older iperf
[11] is multithreaded, and it delivers the best result yet, at 39.7Gbps (Figure 9). Amazon was not lying after all! And this is merely the beginning, because some instance types sport eight network adapters.
Infos
- "Baselines are more important than the benchmark" by Federico Lucifredi, ADMIN , issue 71, 2022, pg. 94, https://www.admin-magazine.com/Archive/2022/71/Baselines-are-more-important-than-the-benchmark
- Amazon AWS, EC2 instance types: https://aws.amazon.com/ec2/instance-types/
- Amazon AWS network stress test policy: https://aws.amazon.com/ec2/testing/
- Configuring enhanced networking on EC2: https://aws.amazon.com/premiumsupport/knowledge-center/enable-configure-enhanced-networking/
- Overview of SR-IOV: https://learn.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-single-root-i-o-virtualization--sr-iov-
- Instance types by network performance: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/general-purpose-vinstances.html#general-purpose-network-performance
- Enhanced networking with the ENA: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-vena.html
- iperf3(1) man page: https://manpages.ubuntu.com/manpages/focal/en/man1/iperf3.1.html
- tracepath(8) man page: https://manpages.ubuntu.com/manpages/jammy/en/man8/tracepath.8.html
- Network MTU: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html
- iperf(1) man page: https://manpages.ubuntu.com/manpages/focal/en/man1/iperf.1.html
Buy this article as PDF
(incl. VAT)