NVDIMM and the Linux kernel

Steadfast Storage

Testing with Linux

You can test NVDIMM on a Linux system as of kernel 4.1, but versions 4.6 or later are recommended. If you like, you can emulate NVDIMM hardware without the physical hardware using Qemu, or you can assign part of main memory to the NVDIMM subsystem using the boot manager's memmap parameter at the command line. Without appropriate hardware, write operations in this area are naturally not persistent.

A dmesg command shows free areas (Listing 1). Areas marked usable can be used by the NVDIMM driver. The last line shows an area of 50GiB usable . Calling

memmap=16G!4G

Listing 1

Showing Memory with dmesg

# dmesg | grep BIOS-e820
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000005efff] usable
[    0.000000] BIOS-e820: [mem 0x000000000005f000-0x000000000005ffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000060000-0x000000000009ffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000077470fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000077471000-0x00000000774f1fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000774f2000-0x0000000078c82fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000078c83000-0x000000007ac32fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007ac33000-0x000000007b662fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000007b663000-0x000000007b7d2fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x000000007b7d3000-0x000000007b7fffff] usable
[    0.000000] BIOS-e820: [mem 0x000000007b800000-0x000000008fffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x0000000c7fffffff] usable

reserves 16GiB of RAM starting around 4GiB (0x0000000100000000 ), and hands it over to the NVDIMM driver. Once you have modified the kernel command line and rebooted the system, you will see an entry similar to Listing 2 in your kernel log. The storage is clearly divided: The kernel has tagged 0x0000000100000000 to 0x00000004ffffffff (4-20GiB) as persistent (type 12) . The /dev/pmem0 device shows up after loading the driver. Now, working as root, you can type

mkfs.xfs /dev/pmem0

to create an XFS filesystem and mount as usual with:

mount -o dax /dev/pmem0 /mnt

Note that the mount option here is dax, which enables the aforementioned DAX functionality.

Listing 2

NVDIMM Memory After memmap

[    0.000000] user: [mem 0x0000000000000000-0x000000000005efff] usable
[    0.000000] user: [mem 0x000000000005f000-0x000000000005ffff] reserved
[    0.000000] user: [mem 0x0000000000060000-0x000000000009ffff] usable
[    0.000000] user: [mem 0x00000000000a0000-0x00000000000fffff] reserved
[    0.000000] user: [mem 0x0000000000100000-0x0000000075007017] usable
[    0.000000] user: [mem 0x0000000075007018-0x000000007500f057] usable
[    0.000000] user: [mem 0x000000007500f058-0x0000000075010017] usable
[    0.000000] user: [mem 0x0000000075010018-0x0000000075026057] usable
[    0.000000] user: [mem 0x0000000075026058-0x0000000077470fff] usable
[    0.000000] user: [mem 0x0000000077471000-0x00000000774f1fff] reserved
[    0.000000] user: [mem 0x00000000774f2000-0x0000000078c82fff] usable
[    0.000000] user: [mem 0x0000000078c83000-0x000000007ac32fff] reserved
[    0.000000] user: [mem 0x000000007ac33000-0x000000007b662fff] ACPI NVS
[    0.000000] user: [mem 0x000000007b663000-0x000000007b7d2fff] ACPI data
[    0.000000] user: [mem 0x000000007b7d3000-0x000000007b7fffff] usable
[    0.000000] user: [mem 0x000000007b800000-0x000000008fffffff] reserved
[    0.000000] user: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[    0.000000] user: [mem 0x0000000100000000-0x00000004ffffffff] persistent (type 12)
[    0.000000] user: [mem 0x0000000500000000-0x0000000c7fffffff] usable

Speed Test

If you own NVDIMMs and want to check out the speed advantage, all you need is a short test with the Unix dd tool. In Listing 3, root copies 4GB from the null device /dev/zero to the PMEM mount point. The oflag = direct flag helps bypass the kernel's buffer cache and thus taps into the power of the physical NVDIMMs (from the preliminary series by HP). The same test on the hard drive of the host returns values of around 50MBps (over a test period of around 80 seconds).

Listing 3

Benchmarking

$ dd if=/dev/zero of=/mnt/test.dat oflag=direct bs=4k count=$((1024*1024))
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 4.55899 s, 942 MB/s

Availability

NVDIMMs will probably go on sale to the general public in 2017. To make the Linux kernel suitable for the fast modules, the hardware industry is handing samples to dedicated kernel developers like SUSE Labs. The results will benefit all distributions (upstream). Installation of the appropriate kernel and driver will then happen successively. Four distributions already can handle NVDIMMs:

  • openSUSE Tumbleweed
  • openSUSE Leap 42.2
  • SUSE Linux Enterprise 12 SP2
  • Fedora 24

During our research, we found no evidence of Debian and Ubuntu following suit, but this will most likely happen before the year is out.

 

Infos

  1. JEDEC JC-45: http://www.jedec.org/committees/jc-456-0
  2. DRAM: https://en.wikipedia.org/wiki/Dynamic_random-access_memory
  3. Flash Is Good: http://research.microsoft.com/~Gray/talks/Flash_is_Good.ppt
  4. "DDR4 NVDIMM standardization: Now and future": http://www.jedec.org/sites/default/files/files/Brett_Williams_Server_Forum_2014.pdf
  5. "Supporting filesystems in persistent memory" by Jonathan Corbet: http://lwn.net/Articles/610174/
  6. "DAX and fsync: the cost of forgoing page structures" by Neil Brown: http://lwn.net/Articles/676737/
  7. NVML NVDIMM library: https://github.com/pmem/nvml/
  8. NVDIMM software architecture: http://pmem.io/2014/08/27/crawl-walk-run.html

The Author

Johannes Thumshirn works at SUSE Linux GmbH as a Linux kernel developer for storage, especially NVMe and NVDIMM, as well as traditional storage technologies such as FC/FCoE, SCSI, and SAS.

Markus Feilner is a Linux specialist from Regensburg, Germany. He has worked with Linux as an author, trainer, consultant, and journalist since 1994. The Conch diplomat, Minister of the Universal Life Church, and Jedi Knight today heads the documentation team at SUSE in Nuremberg, Germany.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • NVDIMM Persistent Memory

    Non-volatile dual in-line memory modules will provide storage as fast as RAM and keep its content through a reboot. The Linux kernel is already geared to handle the new technology and can even serve the modules up as block devices.

comments powered by Disqus