Highly available storage virtualization
Always On
To ensure that companies have permanent access to their own data, many technologies for data storage and management have been established over the years. One solution is storage virtualization, which is a useful approach for coping with massive data growth. In this article, I provide an overview of basic technologies and explain how to implement highly available storage area network (SAN) scenarios.
Data Storage Virtualization
Much like server virtualization, data storage virtualization promises better utilization of resources, simplified central management, and increased data availability. Various technical approaches come together under the storage virtualization umbrella. Each adds a logical – virtual – layer to the storage environment that abstracts servers and applications from the actual physical storage, allowing this storage to be combined into larger areas or pools.
The server operating system itself offers a simple type of storage virtualization by grouping several individual physical hard drives, or logical unit numbers (LUNs), into volume groups and creating logical volumes or devices from them. These are then addressed by the OS as a single virtual hard drive. This type of storage virtualization is used very frequently, especially in Unix-style operating systems that come standard with a Logical Volume Manager. For higher data availability, these volume groups can also be mirrored in RAID 1, which – assuming the corresponding SAN infrastructure is in place – enables synchronous databases at two different locations. However, centralized management, which allows all servers and the associated storage resources to be administered and monitored in one place, is not typically available.
SAN as a Pioneer
More complex types of storage virtualization occur within a SAN, which forms the basis for high bandwidths and shared storage. Two fundamentally different technologies are used: the out-of-band and the in-band methods. With out-of-band virtualization, the data and control information take different paths. However, the enormous complexity of such solutions for SAN has meant that it has not asserted itself on the market, and there are only very few such solutions that use this approach today. In comparison, many storage and virtualization manufacturers have in-band products in their portfolio. The data flows from the server to its disks (LUNs) exclusively via storage virtualization, which is designed as a storage server or virtualization appliance.
Storage subsystems with built-in virtualization functionality are a slightly different form of in-band virtualization. These storage systems have many terabytes of internal hard disk capacity that can virtualize other storage systems. This "external storage," connected by fibre channel or iSCSI, identifies itself to the server in exactly the same way as the capacities of the main system and can therefore not be identified as "external storage."
All these solutions decouple the LUNs or volumes of the storage systems from the server; are capable of boosting performance through effective caching, pooling, or tiering across multiple disk arrays and disk types; and provide additional functions, such as standardized cloning, snapshots, encryption, and mirroring for all virtualized storage resources in a uniform manner. These solutions are particularly well suited for the use of high-availability (HA) storage across several locations.
Virtualized vs. Software-Defined
Software-defined storage (SDS) technology goes one step further. For example, on each physical server involved, hardware-independent software responsible for storage virtualization is installed, which acts as a hypervisor, bundling and centrally orchestrating the storage resources of the servers. In the case of VMware with vSAN or Windows 2016 with Storage Space, such functions are already included in the operating system, which allows the storage resources of the individual servers to be completely decoupled from the hardware and grouped into pools. Services such as deduplication, compression, and data protection are also offered.
A kind of erasure coding (i.e., the intelligent storage of data on several instances) ensures that the data is stored in a fail-safe manner. Compared with conventional SAN storage virtualization, this also means that the local or directly connected hard drives of the individual servers can be managed. SDS solutions can even integrate the unused RAM of the servers as a kind of cache with extremely fast access times. SDS as a relatively new virtualization technology is generally considered to have the greatest potential for the future. However, it remains to be seen to what extent this technology can also be used for highly heterogeneous server environments or I/O-intensive applications.
Buy this article as PDF
(incl. VAT)