How to back up in the cloud
Rescue Approach
Backups in conventional setups are in most cases clear-cut. If you order an all-encompassing no-worries package from a service provider, you can naturally assume that the provider will also take care of backups. The provider then has the task of backing up all user data such that it is quickly available again in the event of data loss. Some technical refinements are required, such as incremental rolling backups, for example, when a specific database status needs to be restored. With this model, the customer only needs to trigger a recovery action when necessary.
This principle no longer works in clouds, because the classical supplier-customer relationship often no longer exists. From the customer's perspective, the provider is the platform provider, but the website running in the cloud may have been programmed by an external company, and you're actually responsible for running it yourself. In the early years of the hype surrounding the cloud, many companies learned the hard way that Ops also means taking care of backups, which raises the question: How can backups of components running in the cloud be made as efficient as possible?
Backing up is also challenging from the point of view of the platform provider, because the cloud provider does not prepare for the failure of individual components when it comes to backups. It is as much about protection against notorious fat fingering (i.e., the accidental deletion of data) as of classic disaster recovery. How can a new data center be restored as quickly as possible if a comet hits the old data center?
In the beginning, you need to understand the provider's viewpoint of how to ensure that backups are created efficiently and well, so that a fast restore is possible.
The answer to this question is a counter-question: What has to be in a backup for the provider to be able to restore the data in as short a time as possible? Where every single file used to end up in some kind of backup, today it makes sense to differentiate carefully. Especially in cloud setups, automation is usually an elementary component of the strategy, which entails some new aspects when it comes to backup.
Automation is Key
When you plan and build a cloud setup, you cannot afford to do without automation. Conventional setups of a smaller nature might still be manageable, but with huge cloud setups comprising servers and infrastructure, this approach is obviously useless, because even if a setup starts very small, automation will be necessary when it comes to horizontal scaling, which is exactly the purpose of clouds. In short, anyone who builds a cloud will want to automate right from the start.
A high degree of automation also means that individual components of the setup can be restored "from the can" in most cases. If the computer hardware dies or a hard disk gives up the ghost, the operating system can be restored from the existing automation without any problems after the defective components have been replaced. There is no need to back up individual systems in such setups.
Keeping the Boot Infrastructure Up and Running
For the process to work as described, several aspects of the operational strategy must be viable. First, the boot infrastructure needs to be available; it offers services such as FTP, NTP, DHCP, and TFTP (Figure 1). Only if those services do their job is the installation of new systems actually possible. At a data center, the boot infrastructure (usually one server is sufficient) ideally should be redundant – if necessary, with classical tools like Pacemaker.
If a provider wants to achieve geo-redundancy, it does not hurt to operate at least the boot infrastructure of a setup as a hot standby at a second data center because, if worst comes to worst, the functioning boot infrastructure is the germ cell of a new setup.
Therefore, the boot infrastructure is one of the few parts of the underlay that requires regular backups – at a different location. Note that the boot infrastructure should also be automated to the greatest extent possible: NTP, TFTP, DHCP, and so on can be rolled out easily by Ansible to localhost
.
The required roles, if they are written locally and not organized somewhere on the web, are mandatory parts of the underlay backup. Even if you use ready-made components from the network for the roles, be sure to back up the Ansible Playbooks and the configuration files of the individual services, because these are specific to the respective installation. It goes without saying that a Git directory is a useful choice for all these tasks; ideally, it should be synchronized to another location outside the setup.
If it is possible to restore the boot infrastructure as quickly as possible in the event of a disaster, a functional platform can be bootstrapped from it in no time at all, and operations can be restored. However, a setup raised in such a way at another location would be just the bare bones; the data would be missing.
Persistent Data
As is often the case in a cloud context, it is essential to distinguish between two types of data. On the one hand, the cloud setup always contains the metadata of the cloud itself, including the full set of information about users, virtual networks, virtual machines (VMs), and comparable details. The other type of data that admins usually have to deal with in the cloud is payload data – that is, data that customers store in their virtual area of the cloud. Many types of backup strategies are available, but not all of them make sense.
Metadata backups are easily solved by creating backups to match. Most cloud solutions rely on databases in the background anyway, which is where they store their own metadata, and backing up MySQL, which is the standard tool in OpenStack out of the box, is a previously solved problem.
If you regularly back up the MySQL databases of the individual services, OpenStack only has to deal with special cases separately. Sometimes software-defined networking (SDN) solutions use their own databases. Another topic occasionally forgotten is operating system images in OpenStack Glance. If they do not reside on central storage or would need to be retrieved from the web, backups should be created.
If you use Ceilometer, don't forget its database when backing up; otherwise, the historical records for resource accounting are gone, and in the most unfavorable case, that is conceivably not only a technical problem, but also – and above all – a legal one.
Cloud services that can create backups for customers themselves also deserve special attention. This option is particularly common for Infrastructure as a Service (IaaS) services. If you implement Database as a Service (DBaaS) in OpenStack with Trove, you can have your databases backed up automatically at regular intervals (Figure 2). Providers have a choice: either communicate clearly to the customer that they are responsible for permanent storage of the backups themselves or include backups of those services in the provider's own backups.
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.