Adding high availability to a Linux VoIP PBX
NumberPlease
As more staff work from home, the importance of corporate phone systems has increased, along with demand for leading edge telephony features like follow-me, voicemail to email, secure video conferencing, and so on. As a result, many companies have upgraded their legacy phone systems to Voice over IP (VoIP), including changing to open source solutions like the private branch exchanges (PBXs) Asterisk [1] and FreeSWITCH [2].
At the same time, call centers have been migrating to VoIP at a harrowing rate because of the feature and cost benefits associated with open source solutions. Even critical call centers (e.g., emergency services or high-volume retail order placement centers) have embraced open source VoIP, moving products like Asterisk and FreeSWITCH into the mainstream of telephony.
What all these environments have in common is that the PBX has now become mission critical, with little tolerance for down time. An outage on one of these critical PBXs might be measured in thousands of dollars lost per minute – or even in lives lost. To ensure open source PBXs can meet the demands of mission-critical call centers, organizations are now adding high availability (HA) to their VoIP PBXs.
The same HA technology in use at these critical call centers is also freely available to home users and small offices. In this article, I explore how to create a HA cluster out of any two Linux-based VoIP PBXs; in particular, I demonstrate how to cluster two Asterisk or FreeSWITCH PBXs with the community (free) edition of a popular PBX clustering product from Telium [3].
Designing Your Cluster
Open source enthusiasts are proud to say that every need can be resolved by an open source package – including clustering. Although true, you have to acknowledge that an operating system (OS)-level cluster is quite different from an application-level cluster. Admins who tried to create application (PBX)-level clusters with available OS clustering/heartbeat packages from their distribution's repositories enjoyed a quick win – they built a cluster in a matter of minutes – but that cluster failed to provide resilient telephony services in real-world scenarios. A PBX cluster must monitor, measure, and control VoIP services (e.g., the Session Initiation Protocol, H.323 protocol, etc.), up- and downstream trunk performance, user agent availability, PBX switch functionality (not just checking whether a process is alive), resource availability, and so on.
The sophistication of PBX-specific clustering software is what allows your PBX to detect and recover from real-world failure scenarios and keep everything (files, directories, databases, etc.) safely in sync between nodes (i.e., a failing node must never be allowed to corrupt a healthy node). Most free and open source software (FOSS)-based solutions solve the synchronization problem with Network File System (NFS), Server Message Block (SMB), Distributed Replicated Block Device (DRBD), and other components to "share" file- and block-level resources and perform block-level copying of databases (which is dangerous).
In critical call centers, PBX clusters copy files and data from one node to the other only if the nodes are confirmed healthy. Additionally, SQL databases are synchronized by SQL transactions, which can be reversed if the connection fails midway through an update.
Do-it-yourself scripts and most FOSS packages provide a fast and easy way to "share" files but don't perform the more intelligent health-based synchronization. As well, these scripts and packages can't look deeply into the telephony environment to determine the health of the telephony infrastructure (they perform simplistic Linux process monitoring); this is where the commercial products come in. However, just because you pay for an HA product doesn't mean it works well. You might be surprised to find that some commercial HA products do little more than add or enable the use of FOSS packages already available for free from your Linux repositories.
Fortunately, a community edition (i.e., free) of Telium's PBX HA solution makes implementing a robust PBX cluster easy. The community edition is the same product as the commercial edition used in large critical call centers, just with some capacity and feature limitations that should have no bearing on small businesses or home users. You can download either the High Availability for Asterisk (HAast) [4] or High Availability for FreeSWITCH (HAfs) [5] product to match the PBX software you are running; their setup is identical, so this article applies to both.
Before installing the HA software, you have a decision to make regarding the severity and type of service failure that the telephony equipment and cluster must withstand. For example, if you just need to protect against computing equipment failure (e.g., disk or CPU dying), your entire cluster can reside on-premises, with both primary and backup PBXs sitting side-by-side. If your cluster needs to withstand a local outage (e.g., power outage to your building, Internet outage to your city block), you might choose to keep your primary PBX on-site and your backup PBX in another city or region. If you want to withstand just about any disaster, you might chose to keep your backup PBX in the cloud and keep your primary PBX on-site or optionally place it on a different cloud (e.g., primary on AWS, backup on Azure). These design decisions will increase the sophistication of your HAast/HAfs configuration. For the sake of this article, I'll assume the PBXs reside side-by-side on your desk.
As a best practice, separate the VoIP traffic and management traffic onto two separate networks, so each PBX will contain two network interfaces, one on each subnet. This "multihoming" step is where most users get stuck, but it's not complicated; just ensure that each interface is on its own subnet (Figure 1). The most common mistake is to put multiple interfaces of the same host on the same subnet (Figure 2), which causes routing confusion for the host.
As you will see later in the article, the OS will control the management network interface card (NIC), and HAast/HAfs will control the VoIP NIC. This arrangement allows the cluster to move a shared IP address between the two nodes, so upstream and downstream devices don't see any change in the PBX IP address as the cluster transitions between the nodes.
Numerous other factors need to be considered when designing a more sophisticated VoIP cluster, and a good place to learn more is the VoIP-Info [6] or Server Fault [7] website. However, for a simple PBX cluster, the above information is sufficient to begin implementation.
Installing Prerequisites
For this example, I assume you are already running a Linux-based VoIP PBX. If not, this might be a good time to research Asterisk and FreeSWITCH. Both can be installed with a few simple package manager commands. You will also find configuration generators for both PBXs, which are essentially pretty GUIs to create the configuration files for you. A couple of popular (and fully open source) configuration generator packages are Issabel (for Asterisk) [8] and FusionPBX (for FreeSWITCH) [9].
Assuming you have your first PBX node up and running, the next step is to install the prerequisites' packages. Here, I assume you are running Red Hat 7/CentOS 7, because those are the most popular distributions currently in use in the Linux PBX market; however, equivalent commands and package names exist for most major Linux distributions. (The HAast/HAfs installation guide provides commands specific to different Linux distributions if you need help.)
To install the essential packages, enter:
sudo yum install qt5-qtbase package, qt5-qtbase-mysql, zip, hdparm, sqlite3, dmidecode, ip, nc, iputils, net-tools, rsync, telnet, logrotate, ntp
Next, download the HAast package (this example uses Asterisk), untar the package, and run the installation script to put all the package contents into place:
cd /usr/src wget --content -disposition --no-check -certificate'https://files.telium.io/getproduct?p=haast&v=2.6.10&a=x86_64&d=rh7' tar xvf haast-2.6.10-x86_64-rh7.tar.gz cd haast-2.6.10-x86_64-rh7 sudo ./install_files/updatefiles.sh
The exact version of HAast/HAfs available to download at the time you read this article may change, so be sure to visit the Telium website to get the URL for the latest package. Because the HAast program will be responsible for starting and stopping the PBX service (Asterisk in this case), you must disable the Asterisk service and enable the HAast service:
systemctl disable asterisk.service systemctl enable haast.service
The cluster software is now installed and ready to configure. All of the configuration files reside in the /etc/xdg/telium
directory and in the haast.d
subdirectory therein. The default configuration file already copied into place is called haast.conf
with default settings almost ready to go. You just have to customize the configuration file to fit your particular network and Asterisk/FreeSWITCH environment.
Listing 1 shows only the haast.conf
settings I have customized to match my demonstration environment and to make diagnostics easier, including:
Listing 1
haast.conf
[logging] debug=all [peerlink] localaddress=192.168.1.5 remoteaddress=192.168.1.6 [voipnic] type=physical physicaldevice=eth1 address=192.168.2.100
- Enabling logging of all HAast messages
- Telling HAast about the management IP address of this node and the management IP address of the other node (so the nodes can talk to each other)
- Telling HAast to control a physical NIC (eth1) on this computer as the VoIP interface, and use IP address 192.168.2.100 (which will float between the two nodes)
Listing 2 shows only the /etc/asterisk/manager.conf
settings that customize Asterisk's settings to match my haast.conf
settings above.
Listing 2
/etc/asterisk/manager.conf
[general] enabled = yes port = 5038 bindaddr = 0.0.0.0 displayconnects=no [haast] secret = haast deny=0.0.0.0/0.0.0.0 permit=192.168.1.0/255.255.255.0 permit=127.0.0.1/255.255.255.0 read = all write = all
The included haast.conf
file is quite extensive, affording tremendous flexibility in designing a cluster; however, I suggest you don't touch other settings until you are more familiar with basic cluster operations. Of course, before you put a cluster into production you will want to tighten security, change default passwords, and reduce the level of logging.
This node is now ready to join the cluster, so you can move on to the second node. In general, I suggest you don't repeat the entire installation process on a second node; instead, just mirror the disk of the first node to the second node. If you are using virtual machines, this process is as easy as copying some files, but if you are setting up two identical physical boxes, I recommend you use the dd
command across the network to clone the first system to the second system. Check out The Geek Diary website [10] for instructions on how to clone a Linux system across the network or consider using a tool like Ghost for Linux (G4L) [11]. Once cloned, differentiate the systems (i.e., set a unique hostname, IP address, etc.) and then reverse the local and remote settings in the haast.conf
file.
Installing Optional Components
You can skip this part if you are just experimenting with your first cluster. However, HAast and HAfs include a lot of tools and diagnostics, a web interface, and other components that make management and operation of the cluster much easier.
One of the most important optional components is synchronization of data between cluster nodes, which ensures that changes you make on the active node always get copied to the standby node. HAast/HAfs includes prebuilt configuration files for synchronizing most Asterisk/FreeSWITCH distributions, as well as add-on packages, configuration generators (GUIs), and so on. For example, if you want HAast to synchronize your Asterisk configuration between nodes, just copy the associated sample configuration file into place,
cp sample_files/synchronizations/asteriskconfig.syncjob.conf /etc/xdg/telium/haast.conf.d/
and follow the associated steps described in the installation guide. For a simple or demo PBX, you might be able to skip automatic synchronization if you just copy your configuration files between nodes manually on initial setup or after changes.
The sample_files
directory includes sample sensors, controllers, and more, which are useful if you want HAast to monitor network connections, upstream services, and so on. As well, HAast can control external devices such as Integrated Services Digital Network (ISDN) switches, network controllable outlets, and so on. As you get more comfortable with HAast/HAfs, you might want to add some of these to your configuration, but for the typical small business or home office user, you won't have to dig too deep into these optional components. Keep in mind that HAast/HAfs includes approximately 25 built-in internal sensors (always running) to monitor your cluster nodes, so adding external sensors is optional.
Buy this article as PDF
(incl. VAT)