Highly available storage virtualization

Always On

Consistent HA Required

This amount of storage provisioning only makes sense if the server and network landscape is also available. For servers, this is usually implemented as a cluster or MetroCluster configuration. As with storage, the servers and cluster nodes should be distributed across both locations or smoke zones. Connecting these hosts to HA storage over the SAN is accomplished by two different methods with corresponding effects: local and cross-link disk path configuration.

In the local configuration, only the LUNs at its own location are presented to the respective cluster node (Figure 2). If this cluster node loses access to its LUNs, the applications must inevitably be clustered to the other node that has access to the mirrored LUNs at the other location.

Figure 2: With the local drive path configuration, each cluster node has access only to the LUNs in its data center. A failure of the LUNs at a location results in cluster panic.

In the cross-link configuration, the LUNs from the other location are also presented to all cluster nodes (Figure 3). If the local LUNs are lost, the node can access the LUNs at the other location without interruption through the SAN and the long-haul line and continue to work without swinging the cluster. The disadvantage of this configuration, apart from the higher latency in the event of an error, is that the cluster must also be configured in normal operation in such a way that it should only use the paths to the "local LUN" to avoid permanently writing to the LUNs at the other location with high latency times.

Figure 3: In the cross-link disk path configuration, all cluster nodes have access to the LUNs in their own and the remote data center (here in active/active mode). A failure of the LUNs at one location does not cause cluster panic.

Asymmetric Logical Unit Access (ALUA), which is a standardized function of the SCSI-3 protocol and is supported by most operating systems, makes this possible in a very simple way. However, the storage solution must also provide appropriate ALUA information to the server. If it does, the server receives the optimized path setting and then implements it accordingly by using only the optimized data paths sent from storage. The non-optimized paths are then only used in case of an error and are put in standby until then. If the storage product does not support ALUA, the only option is to set up optimized access controls manually on the server or to use manufacturer-specific drivers or path failover software that can be provided by storage providers and are available for the common operating systems.

However, the installation of such special drivers usually requires a reboot of the server and version maintenance and compatibility checks if the operating systems are patched regularly.

It is very important that the settings for the host bus adapter and path failover software are made exactly as specified by the storage virtualization manufacturer and that the redundant LUN paths on the servers are checked regularly – a prerequisite for uninterrupted operation of the applications. Even if certain LUN paths are not normally used actively, they must still be available for an error situation. It would be fatal if, in the event of a site failure, the cluster panic mechanisms did not work because of incorrectly set parameters or missing redundant LUN paths on the server, despite having an expensive, highly available storage solution in place.

Practicing Failure

Once a virtualization solution is implemented and in operation, you still need to be prepared for any failures. It is important that the operating personnel is familiar with handling such a complex solution, even in an error situation. An operating manual and a contingency manual have often proved their value for such environments. The operating manual explains the basic functionality of the virtualization solution and describes daily tasks in detail.

The contingency manual is used in emergency situations. In addition to the most important emergency telephone numbers, it should also describe in detail all possible error states and recovery procedures.

An emergency always comes unexpectedly and thus generates panic and stress in many places. In a situation like this, an effective contingency manual is often worth its weight in gold. In this context, it also makes sense to generate regularly possible error states in the scope of a disaster case test, to process the recovery procedures, and to check their correctness. After all, a storage solution changes constantly because of new software versions and functionalities, and the server landscape and SAN infrastructure are also subject to a regular update cycle, which can result in significantly changed failure behavior.

Watch for Pitfalls

Despite the many advantages, there are also some potential downsides of highly available storage virtualization. Once such a solution is established, it is very time-consuming to replace it with a solution from another vendor or to stop using virtualization. If a company also uses a solution to virtualize third-party storage from other manufacturers, it can happen that, in the event of a failure, the buck is passed between the individual manufacturers, and the actual problem then fades into the background and is only solved very hesitantly.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus