Lead Image © Lucy Baldwin, 123RF.com

Lead Image © Lucy Baldwin, 123RF.com

Network performance in the world's favorite data center

Head in the Cloud

Article from ADMIN 73/2023
By
We launch a network performance test on EC2 to answer the question: Is the cloud as fast as expected?

In the Performance Tuning Dojo article two issues ago [1], I determined the importance of baselines to every performance analysis. This month, I measure a baseline on the cloud and confirm it meets my expectations – and discuss what to do in light of unexpected surprises. Network performance is a complex problem with many variables. To make the problem tractable, you start by looking at specific scenarios, as opposed to the entire problem space in abstract. What does network performance look like in AWS Elastic Compute Cloud (EC2) when using familiar tools?

Amazon's Network

At its simplest, networking in AWS is straightforward: Launching an instance with the default networking configuration will give you an instance with a valid IP address. At a minimum, an AWS instance has one network device attached. The maximum number of network devices that can be attached depends on the instance type, but it is safe to assume that current instance types (except T2 and M3) all support enhanced networking for Linux. That is where the easy part ends, with available instance configurations reaching to 170Gbps with a single card (and 200Gbps with two cards or the Elastic Fabric Adapter) [2]. Verifying that the expected configuration throughput matches actual wire speed is where benchmarking comes in.

Following the advice in Table 1, launch two m6in.2xlarge instances located in the same AWS availability zone for testing (see the "AWS Testing Policy" box). Rated at "up to 40 Gbps" by AWS [6] and equipped with an Elastic Network Adapter (ENA), they should perform well with any operating system providing appropriate hardware support [7]. I am using Ubuntu 22.04 (Jammy) for this test in the original us-east-1 availability zone.

$ aws ec2 run-instances --image-id ami-0ea1c7db66fee3098 --region us-east-1 --key federico --instance-type m6in.2xlarge --count 2 --output text

Table 1

Network Performance Factors*

Enhanced networking [4]   SR-IOV [5] and better hardware
Elastic network adapter (ENA)
Path MTU Enable jumbo frames Improve throughput, not latency
Direct connection   No intervening gateways
Physical distance Same availability zone The closer the faster
  Same region
  Same continent
Instance type   Bigger is better (CPU bound)
    Credit-IO instances (like T4g) vary performance
Multithreaded application
Elastic Fabric Adapter
* Major cloud network performance factors, roughly in order of precedence.

AWS Testing Policy

Network performance testing on EC2 may inadvertently resemble a distributed denial of service (DDoS) attack. Although the differences between a benchmark and DDoS attack are clear, their similarity lies in the attempt to reach a target node's limits with external traffic. Nothing in this two-node testing should cause AWS to get upset with you, but when performing testing with a fleet of client nodes, it is worth remembering to review their testing policy [3] and determine whether advance notice of your plans is in order. AWS may have suggestions for your testing plan and will be standing by to intervene during your test should the availability zone's network be negatively affected.

The results are shown in Figure 1. Here I used AWS's CloudShell service, which provides convenient access to the client-side command-line interface (CLI) in a pre-configured environment hosted in a web browser terminal.

Figure 1: Launching a couple of m6in.2xlarge instances from the AWS CloudShell.

Dirty Hands

Logging in to the AWS Console, you should double-check first that ENA is correctly enabled (Figure 2). At the instance's shell, you can verify the same by checking that the corresponding driver is loaded (Figure 3):

$ modinfo ena
Figure 2: Browsing the AWS Console to verify the instances indeed have an Elastic network interface.
Figure 3: Checking the network interface at the operating system level.

The same could have been ascertained through the AWS CLI with a clever query (Figure 4). You don't actually need to repeat this check – pick the interface style that most suits you.

Figure 4: Querying the network interface from the AWS CLI.

Onward and Upward

After updating the package caches (sudo apt update), you can install iperf3 [8], the standard tool to test a network path's performance end to end. Found in the Ubuntu Universe repository (install it with sudo apt install iperf3), it measures the effective bit rate between two systems by taking multiple samples and averaging the results. The iperf3 tool defaults to uploads from client to server, but the reverse is also possible (-R option) for bi-directional testing. Launch the server first:

$ iperf3 -s
-------------------------
Server listening on 5201
-------------------------

Note that the server will listen on all the IP addresses of the instance, which will come in handy later. The results are lackluster; you are paying for 40Gbps connectivity, yet the benchmark is showing only 4.7Gbps were achieved (Figure 5)! What is going on?

Figure 5: The first benchmark shows 4.7Gbps, but the advertised speed is 40Gbps.

Inspecting the connection with tracepath [9] shows that although the client is configured with the "jumbo frame" maximum transmission unit (MTU) of 9001 bytes [10], an intervening node is providing the standard Ethernet MTU of 1500 bytes (Figure 6). The explanation for this is straightforward: You used public IP addressing for your test, and 5Gbps is the highest bandwidth the gateway node of the availability zone (AZ) can supply. Switching the test to the private address of your instances (remember that the two instances are on the same AZ), you can achieve a more respectable 9.5Gbps (Figure 7). As you are now hitting the limits of a single network socket, your next test ramps up to 20 parallel connections:

$ iperf3 -c 172.31.20.174 -i 2 -P 20
Figure 6: Path MTU shows an intermediate node is choking traffic.
Figure 7: Avoiding the intervening gateway results in double the performance.

The log is somewhat unwieldy, but the final tally reached 34Gbps (Figure 8). The final piece of the puzzle is unexpected: iperf3 is a single-threaded process; thus, despite your instances sporting eight cores each, the network performance of the test is becoming CPU-bound. The older iperf [11] is multithreaded, and it delivers the best result yet, at 39.7Gbps (Figure 9). Amazon was not lying after all! And this is merely the beginning, because some instance types sport eight network adapters.

Figure 8: Using 20 parallel connections gets closer to the network link's capacity.
Figure 9: A multithreaded version of the benchmark overcomes the CPU-bound network performance.

The Author

Federico Lucifredi (@0xf2) is the Product Management Director for Ceph Storage at IBM and Red Hat, formerly the Ubuntu Server Product Manager at Canonical, and the Linux "Systems Management Czar" at SUSE. He enjoys arcane hardware issues and shell-scripting mysteries and takes his McFlurry shaken, not stirred. You can read more from him in the O'Reilly title AWS System Administration .

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus