Scalable network infrastructure in Layer 3 with BGP
Growth Spurt
Large-scale virtualization environments have ousted typical small setups. Whereas a company previously purchased a few physical servers to deploy an application, today, the entire workload of a new setup ends up on virtual machines running on a cloud service provider's platform.
A physical layout often is based on a tree structure (Figure 1), with the admin connecting all the servers to one or two central switches and adding more switches if the number of ports on a switch is not sufficient. Together, the switches and network adapters form a large physical segment in OSI Layer 2.
In this article, I describe how you can build an almost arbitrarily scalable network for your environments with Layer 3 tools. As long as two hosts have any kind of physical communication path, communication on Layer 3 works, even if the hosts in question reside in different Layer 2 segments. The Border Gateway Protocol (BGP) makes this possible by providing a way to let each server know how to reach other servers; "IP fabric" describes data center interconnectivity via IP connections.
Virtually
New setups in virtual environments deliver much shorter time to market for the customer: Admins no longer need to order the hardware and suffer annoying waits for delivery, installation, and roll-outs. Many benefits also arise for cloud operators: Virtual environments such as public clouds are far more uniform than a variety of individual setups and can be managed more efficiently. Also, horizontal scaling is easier because these platforms can be expanded almost at will.
The changes also affect planning in the IT environment. Previously, IT designed a single setup, built it, and operated it until a new solution replaced the old one. In contrast, massively scalable environments are designed not only for the next five years, but well into the future.
Add the size factor: A cloud environment starts life as a basic setup and grows continuously as the corresponding user demand increases. When planning a public cloud, the planners do not know the target size and must be suitably cautious. If a company makes an error, the consequences that appear later in everyday business can be fatal, making the company put considerable effort into building workarounds to compensate for the flaw in the design of the solution.
Conventional wisdom says the earlier a design flaw is identified while planning a platform, the cheaper it is to remedy. According to a speech by Barry Boehm at EQUITY 2007 [1], the cost of working around design bugs after the requirements have been specified increases non-linearly as the project moves through design (5x), coding (10x), development testing (20x), acceptance testing (50x), and production (>150x): If the design bug is identified and removed in the design phase, the costs are manageable, but if the fault only becomes apparent when the platform is in production, the costs multiply [but see [2] for a dissenting opinion].
Toolbox
On the software side, admins can now access a toolkit to help build large environments. Clouds like OpenStack or container-based solutions such as Kubernetes are factory-built for scalability. Off-the-shelf hardware that is not directly designed for horizontal scaling out of the box is in many cases nevertheless integrated into a scale-out setup by the software: Ceph, for example, easily turns ordinary servers into a scalable object store that can provide a capacity of multiple petabytes.
Scaling, however, still has one major challenge: the network. Clouds like OpenStack make demands on both the logical network and the physical network on the hardware side that are virtually unsolvable with conventional network designs. Whereas software-defined networking (SDN) has long since asserted itself in several variants for logical networks, the physical level can be a tight squeeze for several reasons.
Conventional Tree Structure
Typical network layouts do not work in massively scalable environments because, if an enterprise is planning the network for a classic standalone setup, the maximum target size is known and usually limited to a certain number of servers. If more ports are required, the switch cascade continues on the underlying switch levels, illustrating the disadvantages of the tree structure. On the one hand, the admin is confronted sooner or later with the Spanning Tree Protocol (STP) – long-suffering networkers can tell many a tale of this – and on the other hand, only a fraction of the performance that the main switch could provide actually reaches the final members of such a cascade.
In massively scalable environments, the central premise on which the tree approach described here is founded falls away – the target scale-out is completely unknown. A new customer might want to launch 600 virtual machines on the fly. Depending on the configuration, for the provider, this means they need to add dozens of servers to the racks virtually overnight because the customer will otherwise lease from Amazon, Microsoft, or Google.
At least the total number of required ports is a known value on which planners can base their calculations for the setup. If dozens of servers suddenly find their way into the data center, the network infrastructure needs to grow at the same rate, which cannot be done with tree-like setups and switch cascades.
Admins come under attack from another corner: It is by no means certain three years after the original setup that you will still be able to buy the same network hardware on which you initially relied. Even with devices by the same manufacturer, later models are not guaranteed to be compatible with their predecessors. Detailed tests are therefore needed in such scale-out cases, as they are in cases where admins are looking to replace legacy network hardware with newer, more powerful components.
If you need to install devices from other manufacturers for a later scale-out, you risk a total meltdown: Although all the relevant network protocols are standardized, if you have ever tried to combine devices from different manufacturers, you are well aware that the interesting thing about standards is that there are so many of them.
Buy this article as PDF
(incl. VAT)