Lead Image © Ying Feng Johansson, 123RF.com

Lead Image © Ying Feng Johansson, 123RF.com

High availability for RESTful services with OpenStack

Switchman

Article from ADMIN 18/2013
By
Admins can select from a cornucopia of options to make HTTP-based RESTful services highly available.

High availability has become a fundamental part of the server room. The cost for a single failure is often so significant that a properly implemented high-availability (HA) system is a bargain in comparison. Linux admins can use the Linux HA stack with Corosync and the Pacemaker cluster manager [1], a comprehensive toolbox for reliably implementing high availability. However, other solutions are also available. For example, if you need to ensure constant availability for HTTP-based services, load balancer systems are always recommended – especially for services based on the REST principle, which currently are very popular.

HA Background

High availability is a system state in which an elementary component can fail without causing extensive downtime. A classic approach to high availability is a failover cluster based on the active/passive principle. The cluster relies on multiple servers, ideally with identical hardware, the same software, and the same configuration. One computer is always active, and the second computer is on standby to take over the application that previously ran on the failed system. One elementary component in a setup of this kind is a cluster manager such as Pacemaker, which takes care of monitoring the servers, and if necessary, restarts the services on the surviving system.

For failover to work, a few system components need to collaborate. Installing the application on both servers with an identical configuration is mandatory. A second important issue is the data store: If the system is a database, for example, it needs to access the same data from both computers. This universal access is made possible by shared storage, in the form of a cluster filesystem like GlusterFS or Ceph, a network filesystem such as NFS, or a replication solution such as DRBD (if you have a two-node cluster). See the box called "SSL in a Jiffy" for a look at how to add SSL encryption to the configuration.

SSL in a Jiffy

The procedure described in this article allows admins to set up RESTful components for high availability via load balancing. Virtually every load balancer offers the possibility of SSL encryption. The solution presented in this article, HAProxy [2], has had SSL support since version 1.5. The idea is actually quite simple: The connection between the client and the load balancer is encrypted; the connection between the balancer and the actual destination host is not. In most cases, it doesn't make any difference where the SSL connection terminates, as long as it does so within the target platform control zone and not outside.

Some applications rely on not needing to worry about SSL themselves – OpenStack Swift is a good example: The Swift service does have the option of using the swift-proxy RESTful component to deliver SSL certificates, but according to the documentation, this feature is only released for testing purposes. If you want to use Swift with SSL in production, you need to do so via a load balancer solution (Figure 1).

Figure 1: Using a load balancer and Pacemaker, you can make the proxy server for your Swift storage highly available and help it scale well horizontally. Pacemaker handles the monitoring tasks.

Good Connections

Another issue involves connecting the client to the highly available services: A real HA solution should not require clients to change their configurations after the failover to connect to the new server. Instead, admins mostly work with virtual IPs or service IPs: An IP address is tied to a service and can always be found on the host on which this service is actually running at any given time. Whenever a client connects to this address, it will always find the expected service.

If the software uses stateful connections, that is, if the client-server connection exists permanently, the client should have a feature that initiates an automatic reconnect. The clients for the major databases, that is, MySQL or PostgreSQL, are just a couple of examples. Stateless protocols like HTTP are much easier to handle: Here the client opens a separate connection for each request to the server. Whether it talks to node A or node B for the first request does not matter, and if a failover occurs in the meantime, the client will not even notice.

A REST Load Balancer

RESTful-based services are the subject of much attention, and cloud computing has only increased the interest. More and more manufacturers of applications have stopped using their own on-wire protocol for communication between service and client; instead, they prefer to use the existing and eternally ubiquitous HTTP protocol. Then, you "only" need a defined API for a client to call server URLs in a standard way, possibly sending specific headers in the process, and the server knows exactly what to do – lo and behold, you have a RESTful interface. Because HTTP is one of the most tested protocols on the Internet, smart developers put some thought into the issue of RESTful high availability a long time ago. The common solution to achieve high availability and, at the same time, to scale-out with HTTP services is load balancers.

The basic idea of a RESTful web load balancer is simple: A piece of software runs on a system and listens on the address and the port that actually belongs to the application. This software acts as the load balancer. In the background are the back-end servers, that is, the systems that run the actual web server. The load balancer accepts incoming connections and distributes them in a specified way to the back-end servers. This approach ensures an equal load with no idle systems.

HAProxy is a prominent load balancer that lets you quickly create load balancer configurations (Figure 2). Listing 1 shows a sample configuration for HAProxy that supports SSL and forwards incoming requests on port 80 to three different back-end servers.

Figure 2: HAProxy is an application that does its job solely in user space, providing a clear statistics page that can also enable and disable back ends.

Listing 1

haproxy.cfg

01 global
02   log 127.0.0.1 local0
03   maxconn 4000
04   daemon
05   uid 99
06   gid 99
07
08 defaults
09   log     global
10   mode    http
11   option  httplog
12   option  dontlognull
13   timeout server 5s
14   timeout connect 5s
15   timeout client 5s
16   stats enable
17   stats refresh 10s
18   stats uri /stats
19
20 frontend https_frontend
21   bind www.example.com:443 ssl crt /etc/haproxy/www.example.com.pem
22   mode http
23   option httpclose
24   option forwardfor
25   reqadd X-Forwarded-Proto:\ https
26   default_backend web_server
27
28 backend web_server
29   mode http
30   balance roundrobin
31   cookie SERVERID insert indirect nocache
32   server s1 10.42.0.1:443 check cookie s1
33   server s2 10.42.0.1:443 check cookie s2

If the back-end servers are configured so that a web server or a RESTful service is listening on each computer, a team comprising HAProxy and the back-end servers is already a complete setup. It is not important whether the RESTful service itself needs a web server as an external service, which is the case, for example, with a RADOS gateway (part of Ceph), or whether the service itself listens on a port, as in the case of OpenStack, where all API services take control over the HTTP or HTTPS port themselves (Figure 3).

Figure 3: Each service in OpenStack comes with an API; in the example, you can see the RESTful APIs Nova, Cinder, and Glance, as well as Quantum and Keystone.

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus