Kubernetes networking in the kernel

Core Connection

Baseline CNI and Testing Bandwidth

For the practical part of this article, I'll use the iperf tool for a general indication of Cilium's performance compared with that of Flannel. iPerf is a common tool for testing network throughput by means of a "server" mode at one end of the connection under test and a "client" mode at the other. It's not particularly representative of real-world performance, but it's a reproducible test and gives hard numbers as output, so it's a useful indicator. Earlier, I mentioned the point in a kubeadm installation, for which a network plugin is selected and installed. With my test cluster initialized with kubeadm, I installed Flannel directly from GitHub:

kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

When this command completed successfully and all pods were in the Running state, I created a simple deployment with a netshoot container (Listing 1). The netshoot [2] image comes with every network troubleshooting tool you might need (e.g., tcpdump, iperf, netcat, and various DNS tools). I then exposed port 5001 (the iPerf server port) on that deployment with a Kubernetes service:

Listing 1

netshoot Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: netshoot
  name: netshoot
spec:
  replicas: 1
  selector:
    matchLabels:
      app: netshoot
  template:
    metadata:
      labels:
        app: netshoot
    spec:
      containers:
      - name: netshoot
        image: nicolaka/netshoot
        command: ["/bin/bash"]
        args: ["-c", "tail -f /dev/null"]
kubectl expose deployment netshoot --port=5001 --target-port=5001

The point of adding the Service (rather than having the iPerf client connect directly to the iPerf server's pod IP) is to make sure that Kube-Proxy's iptables rules are included in the routing of the test traffic (for the Flannel test) and to see whether Cilium's alternative approach to service endpoint mapping has any notable effect. Running

iptables -t nat --list KUBE-SERVICES

will show the service mapping rules created by Kube-Proxy.

I exec'd inside the netshoot pod and started the iPerf server with:

iperf -s

In this state, iPerf listens for incoming test requests from iPerf clients. The iPerf client was then run as a temporary shell inside another netshoot pod. An important control for the experiment is that the iPerf client pod should be on a different worker node from the pod running the iPerf server so that the test traffic runs across the underlay network and therefore has to be encapsulated. The bandwidth results for Flannel (for my particular LAN conditions, with no special effort paid to optimizing maximum transmission unit (MTU) settings or anything else) were up to 1.7Gbps.

Replacing Flannel with Cilium

The Cilium documentation describes a thorough process for migrating to the Cilium network plugin in production clusters where you don't want to disrupt existing workloads and you want to have connectivity between pre- and post-migration workloads [3]. If that's not a concern, you can migrate from Flannel to Cilium as follows:

kubectl delete -f kube-flannel.yaml
kubectl -n kube-system delete ds kube-proxy kubectl -n kube-system delete cm kube-proxy rm /etc/cni/net.d    # all hosts
reboot               # all hosts

After the hosts have been rebooted, all pods that require CNI will be in a Pending or Unknown state. Now install the Cilium command-line interface (CLI) on your master host as shown in the quick-start of the official Cilium documentation, then create a values.yaml file that will be used to configure the Cilium installation (Listing 2).

Listing 2

values.yaml

cluster:
  name: kubernetes
k8sServiceHost: 10.124.0.3
k8sServicePort: 6443
kubeProxyReplacement: strict
operator:
  replicas: 1
ipam:
  mode: "cluster-pool"
  operator:
    clusterPoolIPv4PodCIDRList: ["192.168.0.0/16"]
routingMode: tunnel
tunnelProtocol: vxlan
encryption:
  enabled: false
  nodeEncryption: false

Next, use the Cilium CLI to install Cilium:

cilium install --values values.yaml --version 1.15.2
watch kubectl get po -A

After the new Cilium agent pods are up and running, the pre-existing pods (e.g., the netshoot deployment's pod from the earlier test) will gradually return to the Running state. To re-run my performance test, I restarted the iPerf server in the pod of my netshoot deployment. It was contactable on the same Service IP address as before, but the KUBE-SERVICES iptables chain was gone, because Cilium had completely replaced Kube-Proxy and its iptables rules with its own eBPF programs (see the "How Network Plugins Use eBPF" box). Repeating the same iPerf bandwidth test gave marginally better results, at up to 1.8Gbps – again, taking care to ensure that the iPerf client pod was not running on the same worker node as the iPerf server.

How Network Plugins Use eBPF

The eBPF technology allows programs to be run in kernel space without any need to recompile the kernel or compile and load a kernel module. Network plugins such as Cilium can leverage eBPF to execute packet-routing code directly in the network device drivers, reducing the need to analyze packets elsewhere in the operating system. For example, Cilium can replace the kube-proxy addon (which manages mappings between Service and Pod IPs by means of iptables rules) with an eBPF program that does the same job entirely in the kernel.

Enabling WireGuard Encryption in Cilium

Up to this point, the encapsulation mode for internode traffic, for both the Flannel and the Cilium tests, has been standard VxLAN (as built into nearly every modern Linux kernel) with no encryption. Intercepting and decoding such traffic is trivial: Anyone able to capture VxLAN traffic on the underlay network can then decode it with the use of Wireshark's VxLAN analyzer to see the plain text content of the traffic between the pods (Figure 2). Many production Kubernetes workloads don't implement TLS at the pod level, expecting the cluster traffic to be secured automatically, but don't take that outcome for granted.

Figure 2: Wireshark decodes VxLAN traffic on the underlay network to see pod communications in plain text.

Luckily, enabling WireGuard encryption in Cilium is easy. You could have enabled it from the outset in the encryption section of values.yaml; however, because Cilium is already running, you can just use

kubectl -n kube-system edit cm cilium-config

then add enable-wireguard: "true" to the data list, and restart the Cilium agent daemonset. After that, internode traffic will be encrypted by the kernel's WireGuard VPN function and sent to the appropriate destination host on port 51871. You can also verify that encryption is in use with the cilium-dbg tool inside each Cilium agent pod:

exec cilium-6t6ss -- cilium-dbg status | grep Encryption

After enabling encryption on my test cluster, I repeated the same iPerf bandwidth test. Bandwidth dropped drastically, to less than 400Mbps (Figure 3) because of the extra overhead in encrypting and decrypting the content of each encapsulated packet. As you can see, security has a price.

Figure 3: iPerf bandwidth results by Cilium with WireGuard encryption enabled.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=