Open source multipoint VPN with VyOS
Connected Mesh
OSPF on Top
When the DMVPN connection between all peers is running smoothly, it is time for OSPF to manage the IP stuff. OSPF builds a neighbor relationship between the spoke routers and the hub router through the tunnel interface; then, the spoke tells the hub about its local IP networks. The hub starts relaying this information to all other OSPF neighbors.
Spoke routers of the same site must also know each other and build an additional OSPF neighborship over the LAN adapter. This link will be used for the traffic when one of the VPN tunnels is unavailable.
OSPF needs to know the bandwidth of the network adapters to work properly. Best practices use the exact same value on all adapters pointing to the primary DMVPN cloud. The same applies to the secondary DMVPN adapter, but the value must be smaller to indicate the lower preference. For the LAN adapter, it makes sense to use values that represent the physical speed of the interface, especially when more OSPF neighbors are present in the local area network.
WAN Failure
The OSPF routers have learned all IP subnets from their peers over two different paths. The primary path uses DMVPN tunnel 1 and gets its place in the routing table. The less preferred path over tunnel 2 won't be discarded: It stays in the local OSPF database and waits for a tunnel 1 outage and is then promoted in the routing table.
During normal operation, all OSPF routers send keepalive packets at regular times so that if the main Internet link is lost (e.g., cut by a construction worker), both the local OSPF router and the DMVPN hub will learn about this situation because of the missing keepalives. In this scenario, all routes using the unavailable neighbor are removed from the routing table and check the OSPF database and try to find alternatives. They are lucky, because all missing routes are present, with the backup tunnel as the destination. Finally, the routes over tunnel 2 move into the routing table, with availability restored to the other sites.
This automatic method is hidden from the applications, but traceroute
(Linux/macOS) or tracert
(Windows) discover the rerouting. Listing 1 shows a client at site 3 reaching site 4 with and without the primary VPN tunnel.
Listing 1
Solving a Single-Link Outage
# normal state: all links in # working condition traceroute -In 10.4.1.25 1 10.3.1.21 # primary VPN # router site 3 2 172.16.0.8 # primary VPN # router site 4 3 10.4.1.25 # target host # in site 4 # Problem: First link broken # and network has converged traceroute -In 10.4.1.25 1 10.3.1.22 # backup VPN # router site 3 2 172.16.1.7 # backup VPN # router site 4 3 10.4.1.25
LAN Failure
If the primary tunnel fails and the LAN is also using OSPF, OSPF will tell all neighbors (Figure 3); otherwise, you need a first hop redundancy protocol, like the Hot Standby Router Protocol (HSRP), Virtual Router Redundancy Protocol (VRRP), Common Address Redundancy Protocol (CARP), or Gateway Load Balancing Protocol (GLBP). The lowest common denominator between VyOS and Cisco is VRRP.
VRRP uses an additional virtual IP address that is shared by the routers. The clients use this IP address as its "default gateway." To function, the VRRP routers must know which device is responsible for the virtual address. The routers elect a master and a backup. The master works on routing and sends heartbeat packets to its backup router. The backup router stays passive and listens to the heartbeat. If the packets stop arriving, it assumes the master has died and takes over the virtual address. Clients have no need to make any changes during failover or failback.
The router holding the primary VPN must win and become VRRP master. The VRRP's priority values manipulate the VRRP election and determine the correct router as master. If this doesn't happen, the routing will become asymmetric, and troubleshooting gets really messy.
Buy this article as PDF
(incl. VAT)