« Previous 1 2 3 Next »
Kubernetes networking in the kernel
Core Connection
Baseline CNI and Testing Bandwidth
For the practical part of this article, I'll use the iperf
tool for a general indication of Cilium's performance compared with that of Flannel. iPerf is a common tool for testing network throughput by means of a "server" mode at one end of the connection under test and a "client" mode at the other. It's not particularly representative of real-world performance, but it's a reproducible test and gives hard numbers as output, so it's a useful indicator. Earlier, I mentioned the point in a kubeadm
installation, for which a network plugin is selected and installed. With my test cluster initialized with kubeadm
, I installed Flannel directly from GitHub:
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
When this command completed successfully and all pods were in the Running state, I created a simple deployment with a netshoot
container (Listing 1). The netshoot
[2] image comes with every network troubleshooting tool you might need (e.g., tcpdump
, iperf
, netcat
, and various DNS tools). I then exposed port 5001 (the iPerf server port) on that deployment with a Kubernetes service:
Listing 1
netshoot Deployment
apiVersion: apps/v1 kind: Deployment metadata: labels: app: netshoot name: netshoot spec: replicas: 1 selector: matchLabels: app: netshoot template: metadata: labels: app: netshoot spec: containers: - name: netshoot image: nicolaka/netshoot command: ["/bin/bash"] args: ["-c", "tail -f /dev/null"]
kubectl expose deployment netshoot --port=5001 --target-port=5001
The point of adding the Service (rather than having the iPerf client connect directly to the iPerf server's pod IP) is to make sure that Kube-Proxy's iptables rules are included in the routing of the test traffic (for the Flannel test) and to see whether Cilium's alternative approach to service endpoint mapping has any notable effect. Running
iptables -t nat --list KUBE-SERVICES
will show the service mapping rules created by Kube-Proxy.
I exec'd inside the netshoot
pod and started the iPerf server with:
iperf -s
In this state, iPerf listens for incoming test requests from iPerf clients. The iPerf client was then run as a temporary shell inside another netshoot
pod. An important control for the experiment is that the iPerf client pod should be on a different worker node from the pod running the iPerf server so that the test traffic runs across the underlay network and therefore has to be encapsulated. The bandwidth results for Flannel (for my particular LAN conditions, with no special effort paid to optimizing maximum transmission unit (MTU) settings or anything else) were up to 1.7Gbps.
Replacing Flannel with Cilium
The Cilium documentation describes a thorough process for migrating to the Cilium network plugin in production clusters where you don't want to disrupt existing workloads and you want to have connectivity between pre- and post-migration workloads [3]. If that's not a concern, you can migrate from Flannel to Cilium as follows:
kubectl delete -f kube-flannel.yaml kubectl -n kube-system delete ds kube-proxy kubectl -n kube-system delete cm kube-proxy rm /etc/cni/net.d # all hosts reboot # all hosts
After the hosts have been rebooted, all pods that require CNI will be in a Pending or Unknown state. Now install the Cilium command-line interface (CLI) on your master host as shown in the quick-start of the official Cilium documentation, then create a values.yaml
file that will be used to configure the Cilium installation (Listing 2).
Listing 2
values.yaml
cluster: name: kubernetes k8sServiceHost: 10.124.0.3 k8sServicePort: 6443 kubeProxyReplacement: strict operator: replicas: 1 ipam: mode: "cluster-pool" operator: clusterPoolIPv4PodCIDRList: ["192.168.0.0/16"] routingMode: tunnel tunnelProtocol: vxlan encryption: enabled: false nodeEncryption: false
Next, use the Cilium CLI to install Cilium:
cilium install --values values.yaml --version 1.15.2 watch kubectl get po -A
After the new Cilium agent pods are up and running, the pre-existing pods (e.g., the netshoot
deployment's pod from the earlier test) will gradually return to the Running state. To re-run my performance test, I restarted the iPerf server in the pod of my netshoot
deployment. It was contactable on the same Service IP address as before, but the KUBE-SERVICES
iptables chain was gone, because Cilium had completely replaced Kube-Proxy and its iptables rules with its own eBPF programs (see the "How Network Plugins Use eBPF" box). Repeating the same iPerf bandwidth test gave marginally better results, at up to 1.8Gbps – again, taking care to ensure that the iPerf client pod was not running on the same worker node as the iPerf server.
How Network Plugins Use eBPF
The eBPF technology allows programs to be run in kernel space without any need to recompile the kernel or compile and load a kernel module. Network plugins such as Cilium can leverage eBPF to execute packet-routing code directly in the network device drivers, reducing the need to analyze packets elsewhere in the operating system. For example, Cilium can replace the kube-proxy
addon (which manages mappings between Service and Pod IPs by means of iptables rules) with an eBPF program that does the same job entirely in the kernel.
Enabling WireGuard Encryption in Cilium
Up to this point, the encapsulation mode for internode traffic, for both the Flannel and the Cilium tests, has been standard VxLAN (as built into nearly every modern Linux kernel) with no encryption. Intercepting and decoding such traffic is trivial: Anyone able to capture VxLAN traffic on the underlay network can then decode it with the use of Wireshark's VxLAN analyzer to see the plain text content of the traffic between the pods (Figure 2). Many production Kubernetes workloads don't implement TLS at the pod level, expecting the cluster traffic to be secured automatically, but don't take that outcome for granted.
Luckily, enabling WireGuard encryption in Cilium is easy. You could have enabled it from the outset in the encryption
section of values.yaml
; however, because Cilium is already running, you can just use
kubectl -n kube-system edit cm cilium-config
then add enable-wireguard: "true"
to the data list, and restart the Cilium agent daemonset. After that, internode traffic will be encrypted by the kernel's WireGuard VPN function and sent to the appropriate destination host on port 51871. You can also verify that encryption is in use with the cilium-dbg
tool inside each Cilium agent pod:
exec cilium-6t6ss -- cilium-dbg status | grep Encryption
After enabling encryption on my test cluster, I repeated the same iPerf bandwidth test. Bandwidth dropped drastically, to less than 400Mbps (Figure 3) because of the extra overhead in encrypting and decrypting the content of each encapsulated packet. As you can see, security has a price.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.