« Previous 1 2 3 4 Next »
Linux device mapper writecache
Kicking It Into Overdrive
Other Caching Tools
Tools earning honorable mention include:
- RapidDisk. This dynamically allocatable memory disk Linux module uses RAM and can also be used as a front-end write-through and write-around caching node for slower media.
- Memcached. A cross-platform userspace library with an API for applications, Memcached also relies on RAM to boost the performance of databases and other applications.
- ReadyBoost. A Microsoft product, ReadyBoost was introduced in Windows Vista and is included in later versions of Windows. Similar to
dm-cache
andbcache
, ReadyBoost enables SSDs to act as a cache for slower HDDs.
Working with dm-writecache
The only prerequisites for using dm-writecache
are to be on a Linux distribution running a 4.18 kernel or later and to have a version of Logical Volume Manager 2 (LVM2) installed at v2.03.x or above. I will also show you how to enable a dm-writecache
volume without relying on the LVM2 framework and instead manually invoke dmsetup
.
Identifying and Configuring Your Environment
Identifying the storage volumes and configuring them is a pretty straightforward process (Listing 1).
Listing 1
Storage Volumes
$ cat /proc/partitions major minor #blocks name 7 0 91264 loop0 7 1 56012 loop1 7 2 90604 loop2 259 0 244198584 nvme0n1 8 0 488386584 sda 8 1 1024 sda1 8 2 488383488 sda2 8 16 6836191232 sdb 8 32 6836191232 sdc
In my example, I will be using both /dev/sdb
and /dev/nvme0n1
. As you might have already guessed, /dev/sdb
is my slow device, and /dev/nvme0n1
is my NVMe fast device. Because I do not necessarily want to use my entire SSD (the rest could be used as a separate standalone or cached device elsewhere), I will place both the SSD and HDD into a single LVM2 volume group. To begin, I label the physical volumes for LVM2:
$ sudo pvcreate /dev/nvme0n1 Physical volume "/dev/nvme0n1" successfully created. $ sudo pvcreate /dev/sdb Physical volume "/dev/sdb" successfully created.
Then, I verify that the volumes have been appropriately labeled (Listing 2).
Listing 2
Volume Labels
$ sudo pvs PV VG Fmt Attr PSize PFree /dev/nvme0n1 lvm2 --- <232.89g <232.89g /dev/sdb lvm2 --- <6.37t <6.37t
Next, I add both volumes into a new volume group labeled vg-cache
,
$ sudo vgcreate vg-cache /dev/nvme0n1 /dev/sdb Volume group "vg-cache" successfully created
verify that the volume group has been created as seen in Listing 3, and verify that both physical volumes are within it, as in Listing 4.
Listing 3
Volume Group Created
$ sudo vgs VG #PV #LV #SN Attr VSize VFree vg-cache 2 0 0 wz--n- 6.59t 6.59t
Listing 4
Physical Volumes Present
$ sudo pvs PV VG Fmt Attr PSize PFree /dev/nvme0n1 vg-cache lvm2 a-- 232.88g 232.88g /dev/sdb vg-cache lvm2 a-- <6.37t <6.37t
Say I want to use 90 percent of the slow disk: I will carve a logical volume labeled slow from the volume group, use that slow device,
$ sudo lvcreate -n slow -l90%FREE vg-cache /dev/sdb Logical volume "slow" created.
and verify that the logical volume has been created (Listing 5).
Listing 5
Slow Logical Volume Created
$ sudo lvs vg-cache -o+devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices slow vg-cache -wi-a----- <5.93t /dev/sdb(0)
Using the fio
benchmarking utility, I run a quick test with random write I/Os to the slow logical volume and get a better understanding of how poorly it performs (Listing 6).
Listing 6
Test Slow Logical Volume
$ sudo fio --bs=4k --ioengine=libaio --iodepth=32 --size=10g --direct=1 --runtime=60 \--filename=/dev/vg-cache/slow --rw=randwrite --numjobs=1 --name=test test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.1 Starting 1 process Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=1401KiB/s][r=0,w=350 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=3104: Sat Oct 12 14:39:08 2019 write: IOPS=352, BW=1410KiB/s (1444kB/s)(82.8MiB/60119msec) [ ... ] Run status group 0 (all jobs): WRITE: bw=1410KiB/s (1444kB/s), 1410KiB/s-1410KiB/s (1444kB/s-1444kB/s), io=82.8MiB (86.8MB), run=60119-60119msec
I see an average of 1.4 kibibytes per second (KiBps) throughput. Although that number is not great, it is expected when sending a number of small random writes to an HDD. Remember, with mechanical and movable components, a large percentage of the time is spent seeking to new locations on the disk platters. If you recall, this method introduces latency and will take much longer for the disk drive to return with an acknowledgment that the write is persistent to disk.
Now, I will carve out a 10GB logical volume from the SSD and label it fast ,
$ sudo lvcreate -n fast -L 10G vg-cache /dev/nvme0n1
verify that the logical volume has been created (Listing 7) and verify that it is created from the NVMe drive (Listing 8).
Listing 7
Fast Logical Volume Created
$ sudo lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert fast vg-cache -wi-a----- 10.00g slow vg-cache -wi-a----- 5.93t
Listing 8
Fast Logical Volume Created from NVMe Drive
$ sudo lvs vg-cache -o+devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices fast vg-cache -wi-a----- 10.00g /dev/nvme0n1(0) slow vg-cache -wi-a----- 5.93t /dev/sdb(0)
Like the example above, I will run another quick fio
test with the same parameters as earlier (Listing 9).
Listing 9
<fio> Test
$ sudo fio --bs=4k --ioengine=libaio --iodepth=32 --size=10g --direct=1 --runtime=60 \ --filename=/dev/vg-cache/fast --rw=randwrite --numjobs=1 --name=test test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 1 process Jobs: 1 (f=1): [w(1)][100.0%][w=654MiB/s][w=167k IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=1225: Sat Oct 12 19:20:18 2019 write: IOPS=168k, BW=655MiB/s (687MB/s)(10.0GiB/15634msec); 0 zone resets [ ... ] Run status group 0 (all jobs): WRITE: bw=655MiB/s (687MB/s), 655MiB/s-655MiB/s (687MB/s-687MB/s), io=10.0GiB (10.7GB), run=15634-15634msec
Wow! You can see a night and day difference here of about 655MiBps throughput.
If you have not already, be sure to load the dm-writecache
kernel module:
$ sudo modprobe dm-writecache
To enable the writecache volume via LVM2, you will first need to deactivate both volumes to ensure that nothing is actively writing to them. To deactivate the SSD, enter:
$ sudo lvchange -a n vg-cache/fast
To deactivate the HDD, enter:
$ sudo lvchange -a n vg-cache/slow
Now, convert both volumes into a single cache volume,
$ sudo lvconvert --type writecache --cachevol fast vg-cache/slow
activate the new volume,
$ sudo lvchange -a y vg-cache/slow
and verify that the conversion took effect (Listing 10).
Listing 10
Conversion
$ sudo lvs -a vg-cache -o devices,segtype,lvattr,name,vgname,origin Devices Type Attr LV VG Origin /dev/nvme0n1(0) linear Cwi-aoC--- [fast] vg-cache slow_wcorig(0) writecache Cwi-a-C--- slow vg-cache [slow_wcorig] /dev/sdd(0) linear owi-aoC--- [slow_wcorig] vg-cache
Now it's time to run fio
(Listing 11).
Listing 11
Run fio
$ sudo fio --bs=4k --ioengine=libaio --iodepth=32 --size=10g --direct=1 --runtime=60 \--filename=/dev/vg-cache/slow --rw=randwrite --numjobs=1 --name=test test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 1 process Jobs: 1 (f=1): [w(1)][100.0%][w=475MiB/s][w=122k IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=1634: Mon Oct 14 22:18:59 2019 write: IOPS=118k, BW=463MiB/s (485MB/s)(10.0GiB/22123msec); 0 zone resets [ ... ] Run status group 0 (all jobs): WRITE: bw=463MiB/s (485MB/s), 463MiB/s-463MiB/s (485MB/s-485MB/s), io=10.0GiB (10.7GB), run=22123-22123msec
At about 460MiBps, it's almost 330 times faster than the plain old HDD. This is awesome. Remember, the NVMe is a front-end cache to the HDD, and although all writes are hitting the NVMe, a background thread (or more than one) schedules flushes to the backing store (i.e., the HDD).
If you want to remove the volume, type:
$ sudo lvconvert --splitcache vg-cache/slow
Now you are ready to map the NVMe drive as the writeback cache for the slow spinning drive with dmsetup
(in the event that you do not have a proper version of LVM2 installed). To invoke dmsetup
, you first need to grab the block count of the slow device:
$ sudo blockdev --getsz /dev/vg-cache/slow 12744687616
You will plug this number into the next command and create a writecache device mapper virtual node called wc
with a 4K blocksize:
$ sudo dmsetup create wc --table "0 78151680 writecache s /dev/vg-cache/slow /dev/vg-cache/fast 4096 0"
Assuming that the command returns without an error, a new (virtual) device node will be accessible from /dev/mapper/wc
. This is the dm-writecache
mapping. Now you need to run fio
again, but this time to the newly created device (Listing 12).
Listing 12
Run fio to New Device
$ sudo fio --bs=4k --ioengine=libaio --iodepth=32 --size=10g --direct=1 --runtime=60 \--filename=/dev/mapper/wc --rw=randwrite --numjobs=1 --name=test test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 1 process Jobs: 1 (f=1): [w(1)][100.0%][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=7055: Sat Oct 12 19:09:53 2019 write: IOPS=34.8k, BW=136MiB/s (143MB/s)(9.97GiB/75084msec); 0 zone resets [ ... ] Run status group 0 (all jobs): WRITE: bw=136MiB/s (143MB/s), 136MiB/s-136MiB/s (143MB/s-143MB/s), io=9.97GiB (10.7GB), run=75084-75084msec
Although it isn't near the standalone NVMe speeds, you can see a wonderful improvement of random write operations. At 90 times the original HDD performance, you observe a throughput of 136MiBps. I am not entirely sure what parameters are not being configured for the volume during the dmsetup create
to match that of the earlier LVM2 example, but this is still pretty darn good.
To remove the device mapper cache mapping, you first need to flush forcefully (and manually) all pending write data to disk:
$ sudo dmsetup message /dev/mapper/wc 0 flush
Now it is safe to enter
$ dmsetup remove /dev/mapper/wc
to remove the mapping.
« Previous 1 2 3 4 Next »
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.