Lead Image © Andrea Danti, 123RF.com

Lead Image © Andrea Danti, 123RF.com

Business continuity management

Continuity Guaranteed

Article from ADMIN 30/2015
By , By
We take a look at the new business continuity service from Azure and show how to use it.

Azure Site Recovery is designed to help enterprises protect critical applications by coordinating the replication and restore process for physical or virtual computers. The service gives administrators the ability to use their own data center, a hosting service provider, or Azure as the replication location. This saves costs and overhead for setting up and managing a secondary site. Environments can be protected by policy-based replication of virtual machines.

Azure Site Recovery coordinates and manages ongoing replication of data by integrating existing technologies such as Hyper-V Replica, System Center, and SQL Server AlwaysOn. In this article, we show how the new business continuity service works and how to deploy it.

Creating good business continuity involves a fair amount of complexity; after all, the availability of the endpoints and maintaining productivity in case of failure are at stake. Business Continuity Management (BCM) can be simplified in various ways through the use of virtual machines. Whether this means creating and backing up different snapshots of a virtual machine or its operating system through the use of checkpoint technology or simply moving a virtual machine between virtualization hosts, all of these different functions add their own level of complexity. For example, how can you ensure that the virtual servers wake up in the right order on a different host in a different data center? Moreover, how will the underlying network configuration, IP addresses, and DNS cope with this?

Simplifications in the form of system monitoring and intelligence, in combination with automatic mechanisms, can be a relief. Since the release of Windows Server 2012 R2, Microsoft has offered a cloud service for BCM under the name of "Hyper-V Recovery Manager." The first version covered the enterprise-to-enterprise (E2E) scenario by orchestrating a failover of virtual machines from one Hyper-V host/cluster to another. The current version sees the team from Redmond add capabilities beyond cloud recovery or service provider-based recovery to this system with the ability to integrate third-party systems. This explains why the cloud service was renamed Azure Site Recovery (ASR).

Failover Scenarios with Azure Site Recovery

ASR supports multiple failover and recovery scenarios. It can initiate recovery from the primary, physical data center to Azure Infrastructure as a Service (IaaS); it also initiates the recovery to a secondary data center that you can either operate yourself or that resides with a hosting service provider (Figure 1). This means that failover between data centers (E2E) and between a data center to Microsoft Azure (E2A) is possible. If you decide to rely on a cloud storage provider who can also host virtual machines, you can also recover your virtual disks via a storage provider (E2SP). If you are only interested in SQL, you can replicate your databases to Azure – the failover then occurs there. SQL AlwaysOn is used for the replication.

Figure 1: Azure Site Recovery manages a failover from the cloud either to a secondary site or to Azure.

If you use Azure as the failover location for your virtual machines, the service creates the synchronized disks on inexpensive Azure blob storage and even offers geo-redundant storage on request. Optionally, the virtual machine disks can be encrypted for storage with a key defined by the administrator. Even if the failover never happens, you still have the certainty that your data is secure in the cloud thanks to a key of your choice. Virtual disks are kept up to date by cyclical updates.

Hyper-V Replica is used as the underlying technology here; it is included in the hypervisor feature scope of Windows server and also supports operations between hosts within your own data center. The synchronization intervals are selectable between 30 seconds, 5 minutes, or 15 minutes and provide the remote replica with the required delta updates. Thus, administrators can select the synchronization period and define the potential loss in a worst-case scenario for each virtual machine individually, and they can decide how quickly changes are replicated. As the intervals become shorter, the probability that you will need to replicate more data grows – and this can be a question of bandwidth.

These features give admins several options: ASR does not actually understand the terms "primary" or "secondary" data center, which means that you can also create failover scenarios in the form of one or more hub-and-spoke setups, and you can more or less determine the failover targets on an individual virtual machine basis in the worst case. Smaller data centers can thus manage their workloads themselves and use a larger hub location somewhere in the region as a failover location if something goes wrong. This can be an Azure IaaS region or another, larger data center.

Azure Site Recovery Function

ASR is mainly controlled from in the cloud. You handle all the failover and service configuration functions in the Azure management portal. The portal uses agents to communicate with the individual virtual machines and hypervisors. SSL certificates are used for identification. All communication is routed through HTTPS port 443, which takes care of various firewall problems.

Data protection officers and security officers should note that physical data is only stored on Azure in the case of an E2A scenario – that is, replication of virtual machines to Azure. In this case, the virtual hard disks are replicated to Azure and brought to life in case of failure. In all other scenarios, metadata handles failover control. Virtual machines or payload data from the virtual machines do not need to reside on Azure.

The installed agents regularly communicate the health and configuration status to the cloud-based service. This communication between the agents and Azure allows instructions for changing the synchronization configuration or the failover command to be transferred. In this way, Azure Site Recovery becomes a command center for recovery: The hypervisors, including the virtual machines, listen for commands from the cloud and can thus be restored at the push of a button at a different location through the Azure portal. This helps you create more than just a centralized management site for recovery or go through recovery steps for test purposes; in fact, you can offer self-service for recovery, which is great in the enterprise: If several departments are accustomed to managing their own workloads at the data center, you can also assign them the right to restore independent of other services.

In most cases, the agent is only installed on the System Center Virtual Machine Manager (SCVMM) management server along with the required certificate. For branch offices, the best approach is to install the agent directly on the Hyper-V host. There is no need to configure anything on the virtual machine – at guest operating system level, that is.

Clear-Cut Initial Configuration

Although protecting your first VMs with ASR obviously involves a modicum of work, the configuration is amazingly simple and clear cut. Your task list will include enabling a vault, creating a certificate for secure communication, installing the agent, and then creating a virtual network in Azure.

To do this, you need to change to the Recovery Services in the Azure management portal navigation. If you have not already created a vault, this is your first step. In the wizard, assign a name for the Site Recovery Vault and the desired target region – Azure creates a vault within a few seconds.

The vault is now available below the Recovery Services menu item and can be selected for configuration there. This is where you choose the scenario, as described previously:

  • Between an on-premises VMM site and Azure
  • Between two on-premises VMM sites
  • Between an on-premises Hyper-V site and Azure
  • Between two on-premises VMware sites
  • Between two on-premises VMM sites with SAN array replication

In this example, we use Between an on-premises VMM site and Azure (Figure 2). You can download the agent required for VMM from the dashboard – 7MB are unlikely to be an obstacle.

Figure 2: Scenarios supported by Azure Site Recovery – An overview.

The agent installer looks for the required certificate and the vault name. You can generate and download a registry key directly in the Azure management portal. During the installation, the VMM service is briefly stopped, but it is back up again after a few minutes. After successfully completing the installation, you have now created a vault in Azure that is linked to the local SCVMM.

The next step would thus be protecting a local cloud. To do this, you need to open the SCVMM cloud properties in the SCVMM management console and select the cloud for protection by ASR. This gives you a granular approach to choosing which information to replicate to Azure at any time. You need to assign ASR an Azure storage account for the failover to Microsoft Azure; this is where the virtual machines will be stored later on. In the Azure management portal, ASR points out that components are missing, but you can integrate them with a single click on the message.

You need to create a virtual network for later communication in Azure – this allows the virtual machines to continue communicating with the local site in the case of disaster. This step is not integrated with SCVMM; you will need to create the network via the Azure management portal. To do so, go to the Networks menu item and create a new virtual network (VNet). A simple virtual network is fine for the time being.

The configuration includes, for example, setting up a site-to-site VPN connection between Azure and a local network. The VPN ensures a transparent connection to the virtual machine, although it is operated at a totally different location after the failover. Later, you can connect the virtual networks with virtual networks from SCVMM, to be able to assign the right networks to the virtual machines that have failed over.

The last step is to create the recovery plan (Figure 3). This is where you define when to start which virtual machine and what scripts or other input are required. This function is very powerful and is best handled in the scope of a brief brainstorming session. For example, in what order do the virtual machines start? Which virtual machines? Do you want to start a virtual machine in the cloud with a different configuration than in your own data center? You will find a more comprehensive step-by-step guide online [1].

Figure 3: Create a recovery plan.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus