Bpfilter offers a new approach to packet filtering in Linux
New Filter
Netfilter [1], the most important tool on Linux for inspecting packets from the network, does not have a very good reputation. It is regarded as old-fashioned and inefficient, and the associated userspace tool iptables is considered clumsy and difficult to use. Many users have come to rely on third-party solutions that embellish iptables with an appealing GUI and hide the most egregious complications of netfilter from the admin's eyes, but the Linux world has long hoped for a better solution.
For many year, nftables has been considered the heir-apparent for netfilter/iptables, but nftables has some issues of its own and hasn't really caught on (see the "What about nftables?" box).
What about nftables?
Another alternative to netfilter/iptables appeared a few years ago. Nftables introduced a kind of virtual machine in the kernel to check network traffic. The VM is the actual filter, based on rules defined by the admin. The rules for nftables use a different format from those for iptables, which led to significant resistance among many admins to even consider nftables. Anyone who has painstakingly built a complicated set of rules for iptables will not simply want to discard it and start over with another tool. However, nftables lacked a compatibility layer for iptables for a long time, as well as any functional GUIs that could generate nftables rules.
Nftables can now interpret and adequately implement iptables rules, but it still hasn't caught on. At the same time, nftables is seeing competition from an unexpected direction as BPF and bpfilter enter the scene.
Now another contender has appeared on the scene: The bpfilter project launched in 2018, and it isn't really ready for production use yet, but it represents an exciting development in the evolution of Linux. Linux insiders are thrilled that a replacement for iptables might be on the way, but they are also intrigued about bpfilter as a proof of concept for an underlying component that could revolutionize Linux development: the Berkeley Packet Filter (BPF) [2].
This article will introduce you to bpfilter and describe why kernel developers are excited about it. But to set the scene, I'll begin with a little background on iptables.
Background
The roots of iptables go back to 1999. Starting with Linux 2.4, iptables finally replaced ipchains, which in turn had replaced the old-fashioned ipfwAdmin a few years earlier. The coming and going of these network filters in Linux got on the nerves of quite a few people, which is one reason why iptables has stayed around so long. However, the main reason for iptables' longevity has simply been the lack of better alternatives.
Much of the negative response to iptables should really be directed at the netfilter stack, which is no longer even completely congruent with iptables. (See the box entitled "netfilter and iptable.") When the development of netfilter began at the end of the 1990s, most households still used 56K modems for online access. The ipchains packet filter, already available for Linux at that time, was slow, inefficient, and hard to maintain. Rusty Russell, who had been instrumental in the development of ipchains, identified the problem and promised improvements. Iptables was intended to be a panacea for packet filtering. And, unlike its rather unloved predecessor, it was supposed to be faster and more efficient.
netfilter and iptables
Iptables and netfilter are regularly considered different names for the same thing, but they are actually different tools with different (but complementary) roles.
Netfilter describes the kernel part, in other words, those modules in the Linux kernel that dock onto the network interfaces. The kernel routes incoming and outgoing data traffic through the netfilter layer, where individual modules can then manipulate it. Discarding or rejecting packets is not the only function of netfilter: Network Address Translation (NAT), for example, is also implemented in the kernel via the netfilter layer.
Iptables is a complementary command-line tool. Admins use the iptables
command to edit the rules for the different modules in the kernel's netfilter layer. And because the netfilter modules only provide the basic functionality out the box, admins have to invent rules themselves.
The name iptables points to how these rules are stored kernel-side: in the form of tables. Netfilter goes through the rules stored in these tables to find matching packets, the principle being that the first applicable rule is used (first match wins). And there is no other way to do this, given the performance requirements. The kernel cannot browse through a huge set of rules from start to finish for every packet in a meaningful amount of time.
Iptables does not have the reputation of being particularly user-friendly. The extent of the disaster is quickly revealed to anyone who enters iptables-save
on a Linux system. This command displays all the rules currently stored in the kernel on the console. It is not unusual for the number of complex rules with many parameters to be somewhere in four figures, and consequently unsurprising that many admins do not manage their firewall rules themselves with iptables
, but use graphical tools like ufw
(Figure 1) or the firewall manager in YaST2 (Figure 2).
The way iptables stores rules in the kernel also shows that the solution is not necessarily the most efficient. If the admin changes a rule during operation or adds a new one, iptables cannot simply apply the change in netfilter. Instead, it first downloads the entire current set of rules from the kernel, makes the change, and then reloads all the rules. This process of downloading and uploading the rule set can be inefficient and time consuming.
Through the years, developers have taken some steps to optimize netfilter and iptables. At the beginning of iptables development, for example, firewall rules were stored in plain text. Today, rules are stored in IP sets, (i.e., hash tables), which the kernel can process far faster during packet inspection. But given the traffic volumes of modern systems, netfilter still finds it difficult to keep up.
Unfortunately, the Linux developers failed to provide iptables with an architecture capable of adapting gracefully to a vastly increasing number of packets. Of course, no one foresaw back in 1998 that online access speeds would increase from 100Mbit per second to 100GBit per second in 2018.
In principle, netfilter has to deal with four types of data traffic: It is responsible for IPv4, IPv6, ARP, and the lesser known Ethernet packets. Anyone who wants to implement a solution like netfilter today would probably build a generic filter device and then deal with the individual network protocols in the form of specialized modules. In contrast to this approach, iptables has buried much of the filter functionality so deeply in the modules for the individual protocols that they can hardly be meaningfully recycled elsewhere. This approach leads to a huge amount of redundancy in the kernel code and also makes the netfilter program quite difficult to maintain.
Understanding BPF
Linux admins have always envied colleagues who manage BSD and benefit from its superior packet filters. FreeBSD has used the well known ipfilter for a long time, OpenBSD has PF, and NetBSD has NPF. In contrast to Linux, setting firewall rules in BSD is quite convenient. Usually a single file named pf.conf
is all you need.
What almost all BSD variants also have in common is that they support BPF. BPF is not a typical packet filter and has very little in common with netfilter/iptables. BPF is a VM in the Linux kernel that interprets and executes machine code.
The machine language used by BPF was specially developed for the tool. The BPF VM uses a custom operating system and provides a kind of sandbox. At the beginning of its development, BPF was only intended as a little helper for tools like tcpdump
. One feature of BPF is that it attaches itself to open network sockets in OSI Layer 2 and can field code snippets in BPF machine language from programs such as tcpdump
to search for specific packets. BPF is built into the kernel at a point where it sees the packets before the program that the socket belongs to sees them.
Because BPF is more of a VM than a concrete implementation of a set of rules, it creates a flexible framework for packet filtering. After the introduction of BPF, however, development went quiet for a while. BPF was used for its original intended use, but even though the tool has "packet filter" in its name, hardly anyone in the kernel maintenance team thought that BPF could one day be an alternative to netfilter.
In 2011, Eric Dumazet added a kernel patch that was explosive. The patch expanded the kernel to include a just-in-time (JIT) compiler, allowing BPF to translate its program code directly into the native assembler code on supported architectures. This change offered massive benefits to BPF, and it brought BPF back into the developers' consciousness, causing something like a small revolution. Programs that the kernel builds at runtime using a JIT compiler are then executed in user space instead of kernel space, offering a convenient way for the operating system kernel to outsource tasks to user space. JIT in BPF is tantamount to introducing a microkernel architecture through the back door.
Since then, various companies have tuned the fast BPF VM in the kernel for various functions at the network level. Cilium, for example, uses BPF programs to create virtual network interfaces for containers directly in L2, proving that BPF has other advantages over traditional approaches. If your choice of network card supports XDP offloading via a network chip, the BPF program can be executed completely on the network card and does not burden the system's main CPU (Figure 3).
Industry giants like Facebook or Google today make intensive use of BPF on Linux. Facebook, for example, has developed an option for implementing round-robin-style load balancing with BPF without relying on IPVS, which is noticeably outdated. Netflix uses BPF to trace weakening network paths (Figure 4).
Google is working on bpfd
, which is designed to make tracing remote targets on Linux more reliable than simple tools like traceroute
are capable of today.
Cloudflare uses self-built filter apps for BPF to counteract DDoS attacks. And even the makers of the Open vSwitch SDN solution are working on enabling a BPF-based data path that could replace the typically clumsy bridge constructs. It quickly becomes clear: BPF is popular.
Kernel developers have continuously expanded BPF in recent years. Extended BPF (eBPF) offers significantly more functionality in Linux (Figure 5). The BPF implementation of Linux has gradually moved away from the original template.
bpfilter
Daniel Borkmann and Alexei Starovoitov created bpfilter as a proof of concept. Their mantra was that, because BPF is allowed to execute virtually any code in the kernel, it can also execute genuine filter functions and thus serve as a full replacement for iptables. The patch presented by Daniel Borkmann in February 2018 and continuously developed since then does not yet provide a complete replacement for netfilter.
Borkmann and Starovoitov promise full compatibility with iptables. If you rely on iptables to store firewall rules in the kernel, BPF utilities will automatically take care of the rules in bpfilter. The whole thing will be transparent from the administrator's point of view.
Some of the mainstays in the kernel developer team have their doubts about how easy it will be to make bpfilter compatible with iptables rules. Harald Welte, for example, has argued that iptables rules and regulations can be very complex today due to the various functional enhancements of recent years. The risk of overlooking something during re-implementation in bpfilter is thus high. Others argue that functions in bpfilter and netfilter that do not match 100 percent would create more problems in the long run than they solve.
Buy this article as PDF
(incl. VAT)