Storage innovations in Windows Server 2016
Employee of the Month
Windows Server 2016 offers many advances in network storage. To understand what is happening in Microsoft storage now, it is best to start with a recap on some innovations that arrived in Windows Server 2012.
With the Windows Server 2012 release, Microsoft first unveiled an option for setting up a file server for application data using on-board tools. This feature assumes two to eight servers that run a file server in a failover cluster and thus offer high availability. The storage can either be SAS disks in enclosures or logical unit numbers (LUNs) attached via Fibre Channel Storage Area Network (FC SAN)/Internet Small Computer System Interface (iSCSI). This storage is then provided to the application servers, such as Hyper-V or SQL Server, over the network. SMB version 3 is used as the protocol.
In Windows Server 2012 R2, Microsoft offered the ability to use SSDs and HDDs simultaneously in a storage pool for performance reasons. This technology, known as tiering, automatically moves frequently used data in 1MB chunks to fast disks (SSDs) during operation, while data that sees little or no use is stored on HDDs. This technique gives admins the ability to build high-performance, highly available, and economically attractive storage solutions.
If you are using SSDs, 1GB of the available space is used as a write-back cache by default. This reduces the latency for write operations and the negative performance impact on other file operations. Other new features in Windows 2012 R2 were the support for parity disks in the failover cluster, the use of dual parity (similar to a RAID 6), and the ability to automatically repair or recreate Storage Spaces given sufficient free space in the pool. This repair feature removed the need for "hot spare" media. The free disk space on the functional disks is used to restore data integrity.
IOPS with Storage Spaces Direct
The upcoming Windows 2016 Server operating system adds some significant improvements in storage technology (Figure 1). Many companies are already using solutions that draw on local storage from multiple hosts that are logically grouped on a network. Microsoft enters this market with Storage Spaces Direct (S2D). The storage servers you use no longer need a common connection to one or multiple disk enclosures; you can now use locally installed media. The disks can be striped across multiple servers to create a pool that can serve as the basis for one or more virtual data storage devices (vDiscs). However, in building such an array, note that you can only lose a maximum of two servers (depending on the number of nodes).
Windows Server continues to rely on the SMB protocol – and has even improved it with the latest release, so you can also continue to use the core functionality of SMB3:
- SMB Multichannel: When using multiple network interface cards (NICs), the usable bandwidth increases, because data is transferred using all available NICs. The failure of one or more adapters is transparently fielded; the link does not break down as long as there is at least one connection. Multichannel does not require teaming, although it is technically possible using multiple teams. The configuration and use of Multipath I/O (MPIO) is not supported; the use of multiple cards is defined in the SMB protocol.
- Remote Direct Memory Access (RDMA): The use of special NICs enables the use of SMB Direct, which allows servers to drop data directly into the RAM of partner servers. The load of the data transfer is not outsourced to the CPU, but the adapters handle this autonomously. RDMA allows for extremely high-performance data transfers with very low latency and a very low CPU load.
If you want to use S2D, there is already a fair amount of detail about the necessary preconditions: You can now also use SATA disks; the requirement for using devices with SAS ports has been dropped. Currently, SAS SSDs, in particular, are quite expensive. In the future, you will be able to go for the slightly cheaper SATA versions. But this does not mean that you should use cheap consumer-level components. Always go for enterprise hardware (see "Getting the Right Hardware" box).
Getting the Right Hardware
When using enclosures for provisioning storage, make sure that you use SAS-only enclosures that pass through the disk one-to-one to the server and do not use a RAID controller for nesting. Also, be sure to use Microsoft-certified hardware. Although the enclosures are "dumb," there are differences with respect to their properties. See the Windows catalog for the list of certified hardware [1]. Look specifically for hardware that you can use for Storage Spaces.
If the performance of commercially available SSDs is insufficient, Windows Server 2016 lets you use Non-Volatile Memory Express (NVMe) memory. NVMe is flash memory connected directly via the PCI Express bus. Using the PCI Express bus means NVMe works around the limitations of the SAS/SATA bus. Microsoft presented a corresponding structure in collaboration with Intel at the Intel Developer Conference in San Francisco [2]. The minimum number of servers in such a cluster is four systems, each of which needing at least 64GB of RAM. As a minimum requirement, you need one SSD/flash memory device per node. However, installations where only hard drives are used are very rarely found in practice. Thus, sufficient SSD memory exists in most cases. This is primarily because ordinary hard drives deliver between 100 and 200 IOPS. Compared with this, an SSD achieves 20000-100000 IOPS depending on the model. This massive difference makes the use of tiering extremely attractive, even if the price of an SSD at first appears quite high.
On the network, you will want to use RDMA cards with a rate of at least 10Gbps; cards with 40 or 56Gbps are better. In terms of price, investing in a 40Gbps infrastructure does not cost much more than buying a 10Gbps infrastructure.
Large and Small Scale
Two possibilities exist for using S2D: You can run a scale-out file server, which is exclusively used for provisioning the available storage. Your Hyper-V virtual Machines (VMs) are run on separate hosts that are interconnected via SMB3. Alternatively, you could also use a hyper-converged failover cluster, in which case, your storage and your Hyper-V VMs run on the same hardware systems. This option was not previously supported with Windows on-board resources.
If you operate a very small environment that undergoes relatively little change in the course of operation, the hyper-converged model could be just the thing for you. At least four servers with local disks and sufficient RAM capacity, CPU performance, and network bandwidth will let you operate your entire, highly available VM infrastructure. This approach is not new; there are already quite a few providers that offer such a solution. Technically Microsoft is not offering a world premiere here, but as a user, you have the advantage that the technology is already included in the license, and you do not need to buy any commercial add-on software.
The second variant, separating storage and Hyper-V, is primarily suited for medium to large environments. Especially if you have a large number of VMs, the number of Hyper-V hosts in the failover cluster is significantly larger than that of the nodes in the storage cluster. In addition, if you need more Hyper-V hosts, you only need to extend this cluster without directly growing your storage space. If you use a hyper-converged cluster, you need to expand computing power and storage at the same time, just adding storage-only or compute-only nodes is not possible. Each additional server requires the same features as the existing systems, even if the additional space is not actually required.
Replication with On-Board Tools
Also new in Windows Server 2016 is the ability to replicate your data without the use of a hardware SAN or a third-party software solution. You can replicate your data synchronously or asynchronously. In synchronous replication, the data on the primary side is written directly to the secondary side. Technically, the process is as follows: When an application writes data to its storage space, this operation is written to a log. This log needs to reside in flash memory so that the process is completed as quickly as possible. At the same time, the data is transmitted to the remote side and also written to a log. Once the remote side confirms the successful write, the application that created the data receives confirmation that the write operation has been successfully completed. In the background, the changes are written out from the log to the actual disk. This process no longer delays the application because of the need to wait for confirmation of the write operation. To ensure that this type of replication does not impact the performance of your VMs, the latency between the two sites needs to be minimized. Microsoft specifies a maximum round-trip time of five milliseconds; lower values are preferable of course.
Asynchronous replication behavior is a little different. A change to memory is handled by an application or VM on the primary side. Like in the synchronous version, this data is first written to a log, which will optimally reside on a flash disk. Once complete, the application receives appropriate feedback. While the application is already generating new data, the information from the first process in the log is transmitted to the remote side and also written to a log. Once this process is complete, the remote side confirms successful storage. Now the data is also written out from the log to the disk.
In both versions, you need to consider a number of conditions:
- The primary and secondary side need to have the same data media, data media types, volumes, formatting, and block size.
- The log should be stored in flash memory, which stores the data very quickly.
- A maximum latency of five milliseconds for synchronous replication. RDMA is an option but not mandatory.
- The bandwidth should be as high as possible. As a minimum, you should go for 10Gbps; higher bandwidths are always better.
The use of Storage Replica is possible in several scenarios independently of synchronous or asynchronous transmission. Technically, data replication occurs at the volume level. This means that it does not matter how the data is stored on the volume, what filesystem you use, or where the data is located. Storage Replica is not a Distributed File System Replication (DFSR) that replicates files. You can replicate data between two different servers, or replicate data within a stretch cluster, or between two clusters. SMB3 is used to transfer the data. This approach allows for the use of various techniques: Multichannel, Kerberos support, encryption "over the wire," and signing.
The target drive is not available at the time of replication. To view the replication of your data as a backup is also wrong. If a logical error in your data occurs on the primary side, for example, this error is also replicated, and you have no backup of the data – and thus no option for reverting to the original and correct state.
Buy this article as PDF
(incl. VAT)