Running OpenStack in a data center
Operation Troubles
Ceph Tips and Tricks
If you want to build an OpenStack solution, you would do well to opt for Ceph initially because it covers most daily needs and can be optimized through customization. A classic approach, for example, is storing the journals for individual hard disks in Ceph – object storage devices (OSDs) in Ceph-speak – on fast SSDs, which results in good performance for write operations in the cluster and considerably reduces the time required for write operations (i.e., latency).
The factors already mentioned in the "Matching Hardware" section need to be pointed out again: Ceph is not well suited to operations in a hyper-converged system, because the storage nodes need to be independent and they should be able to talk with the OpenStack nodes through high-bandwidth network paths so that the failure of a node and the resulting network traffic do not block the network.
Many admins use their own servers for the monitor servers (MONs), the guard dogs that monitor the cluster and enforce its quorum; however, MONs require virtually no resources, so they can run on the OSD nodes, making it possible to save money.
On the other hand, you should not scrimp when it comes to equipping the cluster with disks. Many admins succumb to the temptation of using cheaper consumer devices instead of enterprise-grade SATA or SAS disks, which can turn out to be a stumbling block in everyday life. Consumer disks differ from their enterprise colleagues, especially in terms of failure behavior.
A dying consumer disk tries to keep working for as long as possible, whereas an enterprise disk declares itself dead with major malfunctions, completely denying service. In a Ceph cluster, you want this enterprise behavior: Ceph needs the disk to fail completely before it notices the defect and implements the many redundancy measures. A disk that is only slightly broken will mean hanging writes, in the worst case, and will make troubleshooting more difficult.
When it comes to Ceph and OpenStack, not all that glitters is gold because the combination is fraught by technical problems: If you run a database in a VM on Ceph, for example, you are unlikely to be satisfied by the performance. The underlying Controlled replication under scalable Hashing (CRUSH) algorithm in Ceph is extremely prone to latency, and small database writes unfortunately make this more than clear.
The problem can only be circumvented by providing alternatives to Ceph, such as local storage or Fibre Channel-attached local storage. However, this kind of setup would easily fill a separate article, and I can only point out here that such approaches are possible.
High Availability
Now that I've reached the end of this OpenStack series, I'll look at a topic that is probably familiar to most admins from the past. After all, high availability (HA) plays an important role in practically every IT setup. Redundancy has already been mentioned several times in the three articles – no wonder: Many components of a classic OpenStack environment are implicitly highly available. Ceph, for example, takes care of HA itself, in that several components need to fail at the same time to impair the service.
SDN is not quite as clear cut: Whether and how well the SDN layer handles HA depends to a decisive extent on the solution you choose, but practically all SDN stacks have at least basic HA features and can field a gateway node failure. You can then move the networks assigned to the failed node to other nodes.
Is everything great in terms of HA, then? Well, not quite, because the services that define OpenStack itself need HA. In the meantime, most OpenStack services are designed such that multiple instances can exist in the cluster. However, this does not work with a central component: Although the API services can be started multiple times, it is simply impossible to handle many IP addresses from a client perspective. The usual approach is to deploy a load balancer upstream of the API services that distributes incoming requests to its various back ends. Of course, it must also be highly available: If it is a normal Linux machine, a cluster manager like Pacemaker offers a useful approach. In the case of commercial products, the manuals reveal how HA can work.
MySQL and RabbitMQ
MySQL and RabbitMQ also are not to be forgotten in terms of HA. Although they are not directly part of OpenStack, they are important parts of virtually any OpenStack installation. For MySQL, HA can be achieved either by means of a cluster setup and a special load balancer like MaxScale (Figure 4), MySQL Group Replication (a new feature as of version 5.5), or Galera and its integrated clustering.
With RabbitMQ, the facts look less encouraging: The service has a clustered mode, but it has proven unreliable on several occasions. As an alternative to RabbitMQ, Qpid could be a candidate, but it also does not offer a convincing HA history. In terms of the Advanced Message Queuing Protocol (AMQP), it is – at least initially – more sensible to monitor the respective service (typically RabbitMQ) and alert the admin via a corresponding monitoring system.
Alternatively, an appropriate HA setup can be implemented via a service IP and Pacemaker – even if this means taking on additional complexity in the form of Pacemaker.
Buy this article as PDF
(incl. VAT)