High availability for RESTful services with OpenStack

Switchman

High Availability for the Load Balancer

The HAProxy setup presented here comes with a small drawback, which the admin still has to take care of: Although access to the web server is possible as long as one back-end server is up, a failure of the load balancer would bring down the system. The task is thus to prevent the load balancer from becoming a nasty Single Point of Failure (SPOF) in the installation – and at this point, Pacemaker comes back into play. If you are planning a load balancer configuration, you should be at least fundamentally comfortable with Pacemaker  – unless you are using a commercial balancer solution that automatically handles HA.

Pacemaker offers a very useful function in terms of the web server processes on the back-end hosts. Pacemaker can automatically periodically check whether the processes are still running, and a clone directive means that this can happen simultaneously for all back-end servers. Depending on the size of the setup, Pacemaker provides true added value, but be careful: If you are using Pacemaker, 30 nodes is the maximum size for a cluster of back-end servers. A set comprising an Apache resource and a clone directive looks like the following:

primitive p_apache ocf:lsb:apache2 op monitor interval="30s" timeout="20s"
clone cl_apache p_apache

In practical terms, admins have two tacks for coming to grips with the problem. Variant 1 envisages running the load balancer software in its own failover cluster and giving Pacemaker the task of ensuring HAProxy functionality. Using a clone directive, you could even run HAProxy instances on both servers and combine the installation with DNS round-robin balancing. This solution would let you avoid having one permanently idle node in the installation. Listing 2 contains a sample Pacemaker configuration for such a solution with a dedicated load balancer cluster. The big advantage is that the load caused by the balancers themselves remains separate and does not influence the servers on which the application is running.

Listing 2

Pacemaker with DNS RR HAProxy

primitive p_ip_lb1 oct:heartbeat:IPaddr2 \
        params ip="208.77.188.166" cidr_netmask=24 iflabel="lb1" \
        op monitor interval="20s" timeout="10s"
primitive p_ip_lb2 oct:heartbeat:IPaddr2 \
        params ip="208.77.188.167" cidr_netmask=24 iflabel="lb2" \
        op monitor interval="20s" timeout="10s"
primitive p_haproxy ocf:heartbeat:haproxy \
        params conffile="/etc/haproxy/haproxy.cfg" \
        op monitor interval="60s" timeout="30s"
clone cl_haproxy p_haproxy
location p_ip_lb1_on_alice p_ip_lb1 \
        rule $id="p_ip_lb1_prefer_on_alice" inf: #uname eq alice
location p_ip_lb2_on_bob p_ip_lb2 \
        rule $id="p_ip_lb2_prefer_on_bob" inf: #uname eq bob

Load Balancing via Software or DNS?

If you want to do without load balancing software, you can alternatively set up DNS-based round-robin balancing, but you should be aware of the disadvantages of such a solution: First and foremost, DNS entries cannot be changed on the fly. A DNS entry that has five A records will result in 20 of 100 clients receiving an error message in the event of a target server failure. You could work around the problem by managing the target IPs themselves in a Pacemaker cluster, so you never have a missing IP address.

Such a scenario also does not provide the opportunity to check the web server in the background via the load balancer itself. It's quite conceivable that httpd is not working, even though it can be reached on its usual IP address, because a problem exists locally on the target server. Load balancer programs often provide monitoring capabilities that go so far as to use HTTP to connect with the back-end server and check if the page returned in response to their request has the content that it should have. In this case, the admin would typically build a separate monitoring page that performs various checks in the background and then finally outputs "Everything is working." When the balancer receives this text, it knows that the back-end system is fine; otherwise, it automatically removes the system from the configuration.

Despite the disadvantages, DNS load balancing means one less software component to maintain in your setup. DNS load balancing is thus well suited for small, simple services in particular. And, if you are building a very large setup, you might benefit from combining the two methods: It is conceivable, for example, to operate several active load balancers, which the clients reach via DNS LB. The balancers in turn form an HA cluster with Pacemaker, which takes care of the high availability of IP addresses.

Variant 2 assumes one of the existing back-end servers additionally runs the load balancer. Again, combined with Pacemaker, this approach ensures that a load balancer is running on one of the nodes and that the platform is thus available for incoming requests. If the load caused by the balancer is negligible in size, such a solution is especially useful if you have very little hardware available. However, a configuration with a separate balancer cluster is technically preferable.

OpenStack Example

Thus far in this article, I have mainly dealt with the question of how to achieve high availability for RESTful-based APIs using a load balancer. The method described previously leads to an installation featuring multiple servers with a complete load balancer setup including back-end hosts. The description so far, however, has ignored the specifics of individual RESTful solutions. Especially in a cloud context, where REST interfaces are currently experiencing a heyday, the use of these interfaces is often highly specific. OpenStack is a good illustration: Each of the OpenStack services has at least one API or is itself an API. What is commonly described as an "OpenStack cloud" is actually a collection of several components working together.

The main focus of OpenStack is on horizontal scalability. Users communicate constantly with the individual components, and the services themselves talk to one another. To allow this communication, the services need to know the addresses at which OpenStack components such as Glance and Nova reside. OpenStack's Keystone identity service is used for this task (Figure 4); Keystone maintains the list of endpoints in its database. OpenStack developers call the URL of a RESTful API, through which it accepts commands from the outside endpoint. The highlight: The endpoint database is not static. For example, to change the address at which Nova is reachable, you can simply redirect the endpoint in Keystone. This redirect provides for much greater flexibility than would hard-coded values in the configuration files. By design, each client in OpenStack retrieves the endpoint of a service before it connects to that service.

Figure 4: Keystone is the telephone book of the OpenStack cloud. If a load balancer is running on the appropriate port on alice.local, the setup works as desired.

High availability for RESTful services using a load balancer requires more than just installing additional RESTful services and HAProxy. Luckily, any number of instances of the API services can exist simultaneously in OpenStack, as long as they access the same database in the background. Note, however, that the endpoint configuration is adapted for HA in OpenStack's Keystone identity service. In concrete terms, if the IPs of the APIs themselves were registered in the endpoint database previously, you need to enter references for the load balancers. On receiving requests, Keystone thus directs the users to the load balancers, which then open the connection to the back-end servers, where the actual OpenStack APIs are running. The schema shown in Figure 5 can apply to both internal and external links.

Figure 5: What looks messy is actually the traffic that passes between the Keystone client and the server – in this case, an SSL authentication token. You can easily see that HTTPS is the protocol of choice.

Conclusions

The fact that cloud computing solutions such as OpenStack, Eucalyptus, and CloudStack rely on RESTful interfaces greatly facilitates high availability. A solution is available for any problem if you use HTTP as the underlying protocol. After all, HTTP has been around for 20 years, and someone is bound to have found the solution you need. When it comes to standalone APIs, all you need to achieve scale-out and HA are multiple web servers and a balancer. If you are looking for HA at the load balancer level, you can use Pacemaker and avoid the burden of a complex cluster configuration. If you already operate a cloud environment, follow the approach detailed in this article to retrofit high availability and thus secure your installations against failure.

The Author

Martin Gerhard Loschwitz works as a principal consultant at hastexo. His work focuses on high-availability solutions, and in his spare time, he maintains the Linux Cluster Stack for Debian GNU/Linux.

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus