« Previous 1 2 3
NVDIMM and the Linux kernel
Steadfast Storage
Testing with Linux
You can test NVDIMM on a Linux system as of kernel 4.1, but versions 4.6 or later are recommended. If you like, you can emulate NVDIMM hardware without the physical hardware using Qemu, or you can assign part of main memory to the NVDIMM subsystem using the boot manager's memmap
parameter at the command line. Without appropriate hardware, write operations in this area are naturally not persistent.
A dmesg
command shows free areas (Listing 1). Areas marked usable
can be used by the NVDIMM driver. The last line shows an area of 50GiB usable
. Calling
memmap=16G!4G
Listing 1
Showing Memory with dmesg
# dmesg | grep BIOS-e820 [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000005efff] usable [ 0.000000] BIOS-e820: [mem 0x000000000005f000-0x000000000005ffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000060000-0x000000000009ffff] usable [ 0.000000] BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000077470fff] usable [ 0.000000] BIOS-e820: [mem 0x0000000077471000-0x00000000774f1fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000774f2000-0x0000000078c82fff] usable [ 0.000000] BIOS-e820: [mem 0x0000000078c83000-0x000000007ac32fff] reserved [ 0.000000] BIOS-e820: [mem 0x000000007ac33000-0x000000007b662fff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x000000007b663000-0x000000007b7d2fff] ACPI data [ 0.000000] BIOS-e820: [mem 0x000000007b7d3000-0x000000007b7fffff] usable [ 0.000000] BIOS-e820: [mem 0x000000007b800000-0x000000008fffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x0000000c7fffffff] usable
reserves 16GiB of RAM starting around 4GiB (0x0000000100000000 ), and hands it over to the NVDIMM driver. Once you have modified the kernel command line and rebooted the system, you will see an entry similar to Listing 2 in your kernel log. The storage is clearly divided: The kernel has tagged 0x0000000100000000 to 0x00000004ffffffff (4-20GiB) as persistent (type 12) . The /dev/pmem0 device shows up after loading the driver. Now, working as root, you can type
mkfs.xfs /dev/pmem0
to create an XFS filesystem and mount as usual with:
mount -o dax /dev/pmem0 /mnt
Note that the mount option here is dax
, which enables the aforementioned DAX functionality.
Listing 2
NVDIMM Memory After memmap
[ 0.000000] user: [mem 0x0000000000000000-0x000000000005efff] usable [ 0.000000] user: [mem 0x000000000005f000-0x000000000005ffff] reserved [ 0.000000] user: [mem 0x0000000000060000-0x000000000009ffff] usable [ 0.000000] user: [mem 0x00000000000a0000-0x00000000000fffff] reserved [ 0.000000] user: [mem 0x0000000000100000-0x0000000075007017] usable [ 0.000000] user: [mem 0x0000000075007018-0x000000007500f057] usable [ 0.000000] user: [mem 0x000000007500f058-0x0000000075010017] usable [ 0.000000] user: [mem 0x0000000075010018-0x0000000075026057] usable [ 0.000000] user: [mem 0x0000000075026058-0x0000000077470fff] usable [ 0.000000] user: [mem 0x0000000077471000-0x00000000774f1fff] reserved [ 0.000000] user: [mem 0x00000000774f2000-0x0000000078c82fff] usable [ 0.000000] user: [mem 0x0000000078c83000-0x000000007ac32fff] reserved [ 0.000000] user: [mem 0x000000007ac33000-0x000000007b662fff] ACPI NVS [ 0.000000] user: [mem 0x000000007b663000-0x000000007b7d2fff] ACPI data [ 0.000000] user: [mem 0x000000007b7d3000-0x000000007b7fffff] usable [ 0.000000] user: [mem 0x000000007b800000-0x000000008fffffff] reserved [ 0.000000] user: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved [ 0.000000] user: [mem 0x0000000100000000-0x00000004ffffffff] persistent (type 12) [ 0.000000] user: [mem 0x0000000500000000-0x0000000c7fffffff] usable
Speed Test
If you own NVDIMMs and want to check out the speed advantage, all you need is a short test with the Unix dd
tool. In Listing 3, root copies 4GB from the null device /dev/zero
to the PMEM mount point. The oflag = direct
flag helps bypass the kernel's buffer cache and thus taps into the power of the physical NVDIMMs (from the preliminary series by HP). The same test on the hard drive of the host returns values of around 50MBps (over a test period of around 80 seconds).
Listing 3
Benchmarking
$ dd if=/dev/zero of=/mnt/test.dat oflag=direct bs=4k count=$((1024*1024)) 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 4.55899 s, 942 MB/s
Availability
NVDIMMs will probably go on sale to the general public in 2017. To make the Linux kernel suitable for the fast modules, the hardware industry is handing samples to dedicated kernel developers like SUSE Labs. The results will benefit all distributions (upstream). Installation of the appropriate kernel and driver will then happen successively. Four distributions already can handle NVDIMMs:
- openSUSE Tumbleweed
- openSUSE Leap 42.2
- SUSE Linux Enterprise 12 SP2
- Fedora 24
During our research, we found no evidence of Debian and Ubuntu following suit, but this will most likely happen before the year is out.
Infos
- JEDEC JC-45: http://www.jedec.org/committees/jc-456-0
- DRAM: https://en.wikipedia.org/wiki/Dynamic_random-access_memory
- Flash Is Good: http://research.microsoft.com/~Gray/talks/Flash_is_Good.ppt
- "DDR4 NVDIMM standardization: Now and future": http://www.jedec.org/sites/default/files/files/Brett_Williams_Server_Forum_2014.pdf
- "Supporting filesystems in persistent memory" by Jonathan Corbet: http://lwn.net/Articles/610174/
- "DAX and fsync: the cost of forgoing page structures" by Neil Brown: http://lwn.net/Articles/676737/
- NVML NVDIMM library: https://github.com/pmem/nvml/
- NVDIMM software architecture: http://pmem.io/2014/08/27/crawl-walk-run.html
« Previous 1 2 3
Buy this article as PDF
(incl. VAT)