Lead Image © Denis Dryashkin, 123RF.com

Lead Image © Denis Dryashkin, 123RF.com

NVDIMM and the Linux kernel

Steadfast Storage

Article from ADMIN 35/2016
By , By
Non-volatile dual in-line memory modules will provide storage as fast as RAM and keep its content through a reboot. The Linux kernel is already geared to handle the new technology and can even serve the modules up as block devices.

Many databases, such as Berkeley DB, Oracle TimesTen, Apache Derby, and MySQL Cluster, can be configured to run completely in their host's main memory. SAP HANA is even an in-memory-only database. Such products are used when you need a quick response despite high workload. Classic high-performance computing also likes to keep work data in RAM on the node so that CPUs do not have to wait too long for data from mass storage.

Because RAM content is always lost in a shutdown or failure, such systems need to be backed up periodically to a non-volatile medium, although it's not especially reliable in practice and can slow the entire system. Non-volatile RAM would be valuable, and it is coming to market soon as non-volatile DIMM (NVDIMM). The role model was buffer cache memory on RAID controllers, which kept its content when switching off the system to ensure the integrity of its data.

In December 2015, the JC-45 subcommittee of the JEDEC industry association [1] published specifications for persistent memory modules that fit the sockets for DDR4 SDRAM DIMMs. However, motherboards need a BIOS that supports the modules. JEDEC envisages two approaches: NVDIMM-N backs up the content of its DRAM in case of a power outage and restores the data when the voltage is restored. NVDIMM-F behaves like an SSD attached to the memory bus.

Useful Hybrids

NVDIMM is a wanderer between two worlds. Easily confused with the appearance of DDR4 modules (Figure 1), only a backup capacitor (often larger), batteries in some cases, and a flash chip reveal its strange features. If you need to shut down the server, the capacitors or batteries buffer the content of the module until it is completely written to the flash chips. Newer technologies even promise to remove this step by making flash fast enough for direct use as main memory.

Figure 1: NVDIMMs (top, front; bottom, back) are still not commercially available, but HP samples are being tested in SUSE Labs. SLE and RHEL already partially support the memory type (© Hewlett-Packard).

A typical use of NVDIMMs, besides the in-memory databases mentioned earlier, would be high-performance computing. Linux developers adapted the kernel to enable DMA transfers to and from NVDIMM – for example, to exchange data faster within a cluster using InfiniBand.

Not every data center needs to keep terabyte-scale databases in memory or calculate the volume of underground gas deposits; however, NVDIMMs can also help with setups that simply want to run storage caching using Bcache or dm-cache. Then, the NVDIMM hardware would act as a faster and safer buffer between the kernel's RAM buffer cache and slow, but large, hard drives or SSDs.

The Nearer, the Faster

The traditional storage pyramid (Figure 2) shows the speed differences between technologies. In the case of DRAM, memory cells of capacitors are arranged to form a matrix, which the memory controller can address quickly by columns and rows. The controller is architecturally very close to the CPU and is connected to it via a fast parallel bus.

Figure 2: The storage pyramid maps important criteria for the use of various technologies and makes it clear that, as cost and performance increase, storage moves ever closer to the CPU.

The CPU, however, typically addresses SSDs via serial attached SCSI (SAS) or serial ATA (SATA). These protocols need their own host bus adapters, which are usually connected to the CPU via PCI Express. Even if you avoid going through an additional chip, as is the case with NVMe or SCSI PQI, the kernel still needs to packetize the data in SCSI or NVMe packets, store the data in main memory, inform the target DMA controller (SSD or NVMe card) of the storage areas and initiate the DMA transfer (for the terminology, see Table 1) [2].

Table 1

Glossary

Abbreviation Explanation
DIMM Dual in-line memory modules.
DMA Direct Memory Access: A method of transferring data from main memory to hardware without the help of the CPU.
DRAM Dynamic Random Access Memory: Volatile memory consisting of capacitors and typically used as main memory in PCs and servers.
HBA Host Bus Adapter: Hardware interface that connects a computer to a bus or a network system. The term is commonly used, especially in the SCSI world, to describe the adapters that connect to the hard drive or the SAN (Storage Area Network).
NVDIMM Non-Volatile Direct Input Memory Modules: Very fast DIMM memory for use either as main memory or mass storage devices.
NVMe Non-Volatile Memory express: Flash memory connected via PCI (Peripheral Component Interconnect) Express, InfiniBand, or fibre channel with lower latency and higher throughput compared with traditional SSDs.
SCSI PQI SCSI PCIe Queueing Interface.

But wait! There's more. After successfully completing the transfer, an interrupt notifies the driver so that it can initiate the next transmission. These transfers must be the correct size, because a disk is a block device that only processes data of a specified package size, typically 512 bytes or 4096 bytes for large hard drives and flash media.

Block devices are at a disadvantage by design when it comes to handling large numbers of very small data units. To manage these problems, operating systems respond with counter strategies: One of these is the page cache, which sits between the filesystem and the kernel's block I/O layer, that ensures the most favorable data is stored in main memory. On the other hand, the I/O scheduler tries to time access to the hardware if possible.

Low Latency and High Granularity

For some applications, these strategies are not enough. Hardware developers are therefore increasingly attempting to shift mass storage closer to the CPU. The popular statement by Microsoft Research employee Jim Gray puts it in a nutshell: "Tape is Dead, Disk is Tape, Flash is Disk, RAM Locality is King" [3]. Whereas newer technologies have increasingly tried to move storage closer to the memory controller, the new approach is to locate mass storage parallel to main memory on the memory controller.

Having mass storage parallel to the main memory offers several advantages: For one, it unsurprisingly reduces latency significantly, such as the delay between issuing an I/O request in the direction of the filesystem and having the first data returned. On the other hand, block size is more or less irrelevant: Whereas access to the classical block devices rely on fixed, fairly large blocks, RAM is now addressable byte- or word-wise. In practice, however, the even faster CPU caches determine main memory access behavior. Operations in which the processor caches have a practical effect and thus have cache-line granularity.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • NVDIMM Persistent Memory

    Non-volatile dual in-line memory modules will provide storage as fast as RAM and keep its content through a reboot. The Linux kernel is already geared to handle the new technology and can even serve the modules up as block devices.

comments powered by Disqus