Exploring OpenStack's Trove DBaaS
Cloud Service
You can install databases such as MySQL, PostgreSQL, or even MongoDB very quickly thanks to package management, but the installation is not even half the battle. A functioning database also needs user accounts and several configuration steps for better performance and security.
This need for additional configuration poses challenges in cloud environments. You can always manually install a virtual machine in traditional settings, but cloud users want to generate an entire virtual environment from a template. Manual intervention is difficult or sometimes even impossible.
Furthermore, the customer isn't supposed to be troubled with setting up the database in today's IT environment. Users expect to be able to set up a service in the cloud with a mouse click.
These considerations have led to the development of a new class of tools that fall under the name Database as a Service (DBaaS). The aim of DBaaS is to make it as easy as possible for cloud customers to use a database. Amazon has used the DBaaS function in its cloud for years, and cloud solutions like OpenStack now have similar features. In this article, I present OpenStack's Trove DBaaS solution.
Trove [1] has been around for many years already. The service didn't have it easy to start with, and the developers needed several attempts to get Trove accepted as an official part of the OpenStack program. The declared goal of Trove is to hide the full technical substructure of a database from the users. Customers just need a database; how that database is implemented in the background remains hidden.
The Architecture of the solution
The Trove design follows the guidelines used for other OpenStack services: The solution consists of an API and a component that executes commands in the background and sends them to the API. Like all OpenStack APIs, the Trove API follows RESTful principles and can be operated via HTTP. The task manager – the executive component – is linked directly to the API and can see incoming API requests.
The Trove Conductor serves as the focal point for guest agents, which also belong to Trove and perform specific tasks within the VM. The Conductor acts like a proxy server for communication between guest agents within the VM and the Trove Task Manager. All communication with other components of OpenStack is made via API calls in the other services (Figure 1).
Users have two options for communication with the API: a command-line (CLI) client or a plugin for the OpenStack dashboard Horizon (Figure 2). As usual, the CLI client supports various commands that are not included in the web interface, so anyone who wants or needs the full scope of Trove functionality will not be able to avoid the console.
Once a command is received via the API, the Task Manager takes care of its implementation. In most cases, the implementation consists of starting a new VM and installing and configuring the necessary database on it. The guest agent performs these tasks, acting as an extension of the Task Manager within the VM.
Own Images
The commercial distribution images directly from Canonical or Red Hat do not include the Trove guest agent (Figure 3) and therefore do not support Trove. If you want to use Trove, you'll need to adapt the image to include the guest agent. The Trove developers explain in a separate document how to make Trove-specific images [2] and provide the necessary tools (Figure 4).
The guest agent within the VM has various tasks. The agent is in charge of installing all required packages and starts the database so that it operates according to the user's requirements. The agent also sets up users for the database, as previously defined by the user. The complexity of the configuration depends on the database you select: Installing a MySQL instance requires fewer work steps than installing a Redis cluster.
You specify the type of database when starting a Trove instance, and, depending on the configuration, the agent uses various templates to get the desired result. Trove relieves users of a lot of the work in setting up the database. And Trove provides other options that would also be available with a manual setup in a virtual machine: When starting the DBaaS instance, the user has the choice of which hardware profile to use with the database.
Of course, users can also choose whether the database should be on the local storage of a hypervisor or on a volume. It is a good idea to use volumes, even if this might lead to loss of performance. A DBaaS instance offers the same options provided by a normal instance in OpenStack – you can restart, delete, or edit your hardware profile with a mouse click.
The value of a solution like Trove stands or falls with the number of supported databases. Trove provides support for the most important members of the fraternity, including MySQL and PostgreSQL. Redis and CouchDB do not cause any problems for Trove, and MongoDB was part of the "primordial soup": the first database officially supported by Trove and the one Trove still handles best today.
An important representative of the enterprise market is missing, however: You will search in vain for the top dog, Oracle. Other popular relational databases, such as Microsoft's SQL Server or IBM DB2, are not included, either.
Orchestration and Clustering
Integration with other OpenStack services, especially the orchestration solution Heat, is of great importance to Trove's success. (A DBaaS is virtually useless if database instances can only be started and managed manually.) The OpenStack developers are aware of the need for Heat support and have already installed comprehensive Heat integration in OpenStack version 2014.1 "Icehouse." This means, for example, that the resource type OS::Trove::Instance
is available for native Heat templates; this resource type starts a DBaaS instance and provides it with the necessary credentials. Heat integration for Trove provides everything needed for everyday life for clusters from multiple database nodes.
Database solutions with their own cluster mechanisms are the most difficult scenario for any DbaaS. The Trove developers have incorporated cluster functions in several places in OpenStack version 2015.1 (Kilo) – MongoDB is definitely the textbook example for DBaaS clustering.
The Trove developers have written an extension of their API with database clustering functionality, and Trove comes with a Cluster
instance type. If you start a cluster instance and specify all the necessary parameters, such as the total number of instances, Trove reliably takes care of the rest.
Trove copes well with the Galera MySQL clustering tool [3] and builds a functional cluster from multiple Galera instances. Trove's Galera support is limited to the database itself; other aspects of the Galera configuration are left to the user.
Buy this article as PDF
(incl. VAT)