Rethinking RAID (on Linux)
Madam, I'm mdadm
Often, you find yourself attempting to eke out a bit more performance from the computer system you are attempting to either build or recycle, usually with limited funds at your disposal. Sure, you can tamper with the CPU or even memory settings, but if the I/O hitting the system needs to touch the underlying storage devices, those CPU tunings will make little to no difference.
In previous articles, I have shared methods by which one can boost write and read performance to slower disk devices by leveraging both solid state drives (SSD) and dynamic random access memory (DRAM) as a cache [1]. This time, I will instead shift focus to a unique way you can configure redundant storage arrays so that you not only boost overall data access throughput but also maintain fault tolerance. The following examples center around a multiple-device redundant array of inexpensive (or independent) disks (MD RAID) in Linux and its userland utility mdadm
[2].
Conventional wisdom has always dictated that spreading I/O load across more disk drives instead of bottlenecking a single drive does help significantly when increasing one's workload. For instance, if instead of writing to a single disk drive you split the I/O requests and write that same amount of data in a stripe across multiple drives (e.g., RAID0), you are reducing the amount of work that a single drive must perform to accomplish the same task. For magnetic spinning disks (i.e., hard disk drives, HDDs), the advantages should be more noticeable. The time it takes to seek across a medium introduces latency, and with randomly accessed I/O patterns, the I/O throughput suffers as a result on a single drive. A striped approach does not solve all the problems, but it does help a bit.
In this article, I look at something entirely different. I spend more time focusing on increasing read throughput by way of RAID1 mirrors. In the first example, I discuss the traditional read balance in mirrored volumes (where read operations are balanced across both volumes in the mirrored set). The next examples are of read-preferred (or write-mostly) drives in a mirrored volume incorporating non-volatile media such as SSD or volatile media such as a ramdisk.
In my system, I have identified the following physical drives that I will be using in my examples:
$ cat /proc/partitions |grep -e sd[c,d] -e nvm 259 0 244198584 nvme0n1 259 2 244197543 nvme0n1p1 8 32 6836191232 sdc 8 33 244197544 sdc1 8 48 6836191232 sdd 8 49 244197544 sdd1
Notice that I have one non-volatile memory express (NVMe) drive and two serial-attached SCSI (SAS) drives. Later, I will introduce a ramdisk. Also notice that single partitions have been carved out in each drive that are approximately equal in size, which you will see is necessary when working with the RAID logic.
A quick random write test benchmark of one of the SAS volumes with the fio
performance benchmarking utility can establish a baseline of both random write (Listing 1) and random read operations (Listing 2). The results show that the single HDD has a throughput of 1.4MBps for random writes and 1.9MBps for random reads.
Listing 1
Random Write Test
$ sudo fio --bs=4k --ioengine=libaio --iodepth=32 --size=10g --direct=1 --runtime=60 --filename=/dev/sdc1 --rw=randwrite --numjobs=1 --name=test test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 1 process Jobs: 1 (f=1): [w(1)][100.0%][w=1420KiB/s][w=355 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=3377: Sat Jan 9 15:31:04 2021 write: IOPS=352, BW=1410KiB/s (1444kB/s)(82.9MiB/60173msec); 0 zone resets [ ... ] Run status group 0 (all jobs): WRITE: bw=1410KiB/s (1444kB/s), 1410KiB/s-1410KiB/s (1444kB/s-1444kB/s), io=82.9MiB (86.9MB), run=60173-60173msec Disk stats (read/write): sdc1: ios=114/21208, merge=0/0, ticks=61/1920063, in_queue=1877884, util=98.96%
Listing 2
Random Read Test
$ sudo fio --bs=4k --ioengine=libaio --iodepth=32 --size=10g --direct=1 --runtime=60 --filename=/dev/sdc1 --rw=randread --numjobs=1 --name=test test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 1 process Jobs: 1 (f=1): [r(1)][100.0%][r=1896KiB/s][r=474 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=3443: Sat Jan 9 15:32:51 2021 read: IOPS=464, BW=1858KiB/s (1903kB/s)(109MiB/60099msec) [ ... ] Run status group 0 (all jobs): READ: bw=1858KiB/s (1903kB/s), 1858KiB/s-1858KiB/s (1903kB/s-1903kB/s), io=109MiB (114MB), run=60099-60099msec Disk stats (read/write): sdc1: ios=27838/0, merge=0/0, ticks=1912861/0, in_queue=1856892, util=98.07%
To test a RAID1 mirror's read balance performance, I create a mirrored volume with the two HDDs identified earlier (Listing 3) and then view the status (Listing 4) and details (Listing 5) of the RAID volume.
Listing 3
Create a Mirrored Volume
$ sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdc1 /dev/sdd1 mdadm: Note: this array has metadata at the start and may not be suitable as a boot device. If you plan to store '/boot' on this device please ensure that your boot-loader understands md/v1.x metadata, or use --metadata=0.90 Continue creating array? y mdadm: Fail create md0 when using /sys/module/md_mod/parameters/new_array mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started.
Listing 4
View RAID Status
cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid1 sdd1[1] sdc1[0] 244065408 blocks super 1.2 [2/2] [UU] [=>...................] resync = 6.4% (15812032/244065408) finish=19.1min speed=198449K/sec bitmap: 2/2 pages [8KB], 65536KB chunk unused devices: <none>
Listing 5
View RAID Details
$ sudo mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Sat Jan 9 15:22:29 2021 Raid Level : raid1 Array Size : 244065408 (232.76 GiB 249.92 GB) Used Dev Size : 244065408 (232.76 GiB 249.92 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sat Jan 9 15:24:20 2021 State : clean, resyncing Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : bitmap Resync Status : 9% complete Name : dev-machine:0 (local to host dev-machine) UUID : a84b0db5:8a716c6d:ce1e9ca6:8265de17 Events : 22 Number Major Minor RaidDevice State 0 8 33 0 active sync /dev/sdc1 1 8 49 1 active sync /dev/sdd1
You will immediately notice that the array is initializing the disks and zeroing out the data on each to bring it all to a good state. You can definitely use it in this state, but it will affect overall performance. Also, you probably do not want to disable the initial resync of the array with the --assume-clean
option. Even if the drives are new out of the box, it is better to know that your array is in a proper state before writing important data to it. This process will definitely take a while, and the bigger the array, the longer the initialization process. Before proceeding with follow-up benchmarking tests, you should wait until volume synchronization completes.
Next, verify that the mirror initialization has completed,
$ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid1 sdd1[1] sdc1[0] 244065408 blocks super 1.2 [2/2] [UU] bitmap: 1/2 pages [4KB], 65536KB chunk ** unused devices: <none>
and repeat the random write and read tests from before, but this time to the RAID volume (e.g., /dev/md0
). Remember, the first random writes were 1.4MBps and random reads 1.9MBps. The good news is that whereas random writes dropped a tiny bit to 1.2MBps (Listing 6), random reads increased to almost double the throughput with a rate of 3.3MBps (Listing 7).
Listing 6
Random Write to RAID
$ sudo fio --bs=4k --ioengine=libaio --iodepth=32 --size=10g --direct=1 --runtime=60 --filename=/dev/md0 --rw=randwrite --numjobs=1 --name=test test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 1 process Jobs: 1 (f=1): [w(1)][100.0%][w=1280KiB/s][w=320 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=3478: Sat Jan 9 15:46:11 2021 write: IOPS=308, BW=1236KiB/s (1266kB/s)(72.5MiB/60102msec); 0 zone resets [ ... ] Run status group 0 (all jobs): WRITE: bw=1236KiB/s (1266kB/s), 1236KiB/s-1236KiB/s (1266kB/s-1266kB/s), io=72.5MiB (76.1MB), run=60102-60102msec Disk stats (read/write): md0: ios=53/18535, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=33/18732, aggrmerge=0/0, aggrticks=143/1173174, aggrin_queue=1135748, aggrutil=96.50% sdd: ios=13/18732, merge=0/0, ticks=93/1123482, in_queue=1086112, util=96.09% sdc: ios=54/18732, merge=0/0, ticks=194/1222866, in_queue=1185384, util=96.50%
Listing 7
Random Read from RAID
$ sudo fio --bs=4k --ioengine=libaio --iodepth=32 --size=10g --direct=1 --runtime=60 --filename=/dev/md0 --rw=randread --numjobs=1 --name=test test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 1 process Jobs: 1 (f=1): [r(1)][100.0%][r=3184KiB/s][r=796 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=3467: Sat Jan 9 15:44:42 2021 read: IOPS=806, BW=3226KiB/s (3303kB/s)(189MiB/60061msec) [ ... ] Run status group 0 (all jobs): READ: bw=3226KiB/s (3303kB/s), 3226KiB/s-3226KiB/s (3303kB/s-3303kB/s), io=189MiB (198MB), run=60061-60061msec Disk stats (read/write): md0: ios=48344/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=24217/0, aggrmerge=0/0, aggrticks=959472/0, aggrin_queue=910458, aggrutil=96.15% sdd: ios=24117/0, merge=0/0, ticks=976308/0, in_queue=927464, util=96.09% sdc: ios=24318/0, merge=0/0, ticks=942637/0, in_queue=893452, util=96.15%
NVMe
Now I will introduce NVMe into the mix – that is, two drives: one NVMe and one HDD in the same mirror. The mdadm
utility offers a neat little feature that can be leveraged with the --write-mostly
argument, which translates to: Use the following drives for write operations only and not read operations (unless a drive failure were to occur on the volumes designated for read operations).
To begin, create the RAID volume (Listing 8). Next, view the RAID volume's details and pay particular attention to the drive labeled writemostly (Listing 9).
Listing 8
Create the RAID Volume
$ sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/nvme0n1p1 --write-mostly /dev/sdc1 mdadm: Note: this array has metadata at the start and may not be suitable as a boot device. If you plan to store '/boot' on this device please ensure that your boot-loader understands md/v1.x metadata, or use --metadata=0.90 mdadm: /dev/sdc1 appears to be part of a raid array: level=raid1 devices=2 ctime=Sat Jan 9 15:22:29 2021 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started.
Listing 9
View the RAID Volume
$ sudo mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Sat Jan 9 15:52:00 2021 Raid Level : raid1 Array Size : 244065408 (232.76 GiB 249.92 GB) Used Dev Size : 244065408 (232.76 GiB 249.92 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sat Jan 9 15:52:21 2021 State : clean, resyncing Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : bitmap Resync Status : 1% complete Name : dev-machine:0 (local to host dev-machine) UUID : 833033c5:cd9b78de:992202ee:cb1bf77f Events : 4 Number Major Minor RaidDevice State 0 259 2 0 active sync /dev/nvme0n1p1 1 8 33 1 active sync writemostly /dev/sdc1
Then, repeat the same fio
tests by executing the random write test (Listing 10) and the random read test (Listing 11).
Listing 10
Random Write with NVMe
$ sudo fio --bs=4k --ioengine=libaio --iodepth=32 --size=10g --direct=1 --runtime=60 --filename=/dev/md0 --rw=randwrite --numjobs=1 --name=test test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 1 process Jobs: 1 (f=1): [w(1)][100.0%][w=1441KiB/s][w=360 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=3602: Sat Jan 9 16:14:10 2021 write: IOPS=342, BW=1371KiB/s (1404kB/s)(80.5MiB/60145msec); 0 zone resets [ ... ] Run status group 0 (all jobs): WRITE: bw=1371KiB/s (1404kB/s), 1371KiB/s-1371KiB/s (1404kB/s-1404kB/s), io=80.5MiB (84.4MB), run=60145-60145msec Disk stats (read/write): md0: ios=100/20614, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=103/20776, aggrmerge=0/0, aggrticks=12/920862, aggrin_queue=899774, aggrutil=97.47% nvme0n1: ios=206/20776, merge=0/0, ticks=24/981, in_queue=40, util=95.01% sdc: ios=0/20776, merge=0/0, ticks=0/1840743, in_queue=1799508, util=97.47%
Listing 11
Random Read with NVMe
$ sudo fio --bs=4k --ioengine=libaio --iodepth=32 --size=10g --direct=1 --runtime=60 --filename=/dev/md0 --rw=randread --numjobs=1 --name=test test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 1 process Jobs: 1 (f=1): [r(1)][100.0%][r=678MiB/s][r=173k IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=3619: Sat Jan 9 16:14:53 2021 read: IOPS=174k, BW=682MiB/s (715MB/s)(10.0GiB/15023msec) [ ... ] Run status group 0 (all jobs): READ: bw=682MiB/s (715MB/s), 682MiB/s-682MiB/s (715MB/s-715MB/s), io=10.0GiB (10.7GB), run=15023-15023msec Disk stats (read/write): md0: ios=2598587/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=1310720/0, aggrmerge=0/0, aggrticks=25127/0, aggrin_queue=64, aggrutil=99.13% nvme0n1: ios=2621440/0, merge=0/0, ticks=50255/0, in_queue=128, util=99.13% sdc: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
Wow. Although the result is a very small increase in write operations because of the NVMe (up to 1.4MBps), look at the random reads. In the case of RAID, you are only as fast as your slowest drive, which is why speeds are hovering around the original baseline of 1.3MBps. The original single HDD benchmark for random reads was 1.9MBps, and the read-balanced mirrored HDDs saw 3.3MBps. Here, with the NVMe volume set as the drive from which to read, speeds are a whopping 715MBps! I wonder if it can be better?
Ramdisk
What would happen if I introduced a ramdisk into the picture? That is, I want to boost read operations but also persist the data after reboots of the system. This process should not be confused with caching. The data is not being staged on the ramdisk temporarily before being persisted into a backing store.
In the next example, the ramdisk will be treated like a backing store, even though the volatile medium technically isn't one. Before I proceed, I will need to carve out a partition roughly the same size as the ramdisk. I have chosen a meager 2GB because the older system I am currently using does not have much installed to begin with:
$ cat /proc/partitions |grep -e sd[c,d] 8 48 6836191232 sdd 8 49 244197544 sdd1 8 50 2097152 sdd2 8 32 6836191232 sdc 8 33 244197544 sdc1 8 34 2097152 sdc2
Now to add the ramdisk. As a prerequisite, you need to ensure that the jansson
development library is installed on your local machine. Clone the rapiddisk
Git repository [3], build the package, install it,
$ git clone https://github.com/pkoutoupis/rapiddisk.git $ cd rapiddisk/ $ make $ sudo make install
insert the kernel modules,
$ sudo modprobe rapiddisk $ sudo modprobe rapiddisk-cache
verify that the modules are installed,
$ lsmod|grep rapiddisk rapiddisk_cache 20480 0 rapiddisk 20480 0
create a single 2GB ramdisk,
$ sudo rapiddisk --attach 2048 rapiddisk 6.0 Copyright 2011 - 2019 Petros Koutoupis Attached device rd0 of size 2048 Mbytes
verify that the ramdisk has been created,
$ sudo rapiddisk --list rapiddisk 6.0 Copyright 2011 - 2019 Petros Koutoupis List of RapidDisk device(s): RapidDisk Device 1: rd0 Size (KB): 2097152 List of RapidDisk-Cache mapping(s): None
and create a mirrored volume with the ramdisk as the primary and one of the smaller HDD partitions set to write-mostly
(Listing 12). Now, verify the RAID1 mirror state:
Listing 12
Create Mirrored Volume
$ sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/rd0 --write-mostly /dev/sdc2 mdadm: /dev/rd0 appears to be part of a raid array: level=raid1 devices=2 ctime=Sat Jan 9 16:32:35 2021 mdadm: Note: this array has metadata at the start and may not be suitable as a boot device. If you plan to store '/boot' on this device please ensure that your boot-loader understands md/v1.x metadata, or use --metadata=0.90 mdadm: /dev/sdc2 appears to be part of a raid array: level=raid1 devices=2 ctime=Sat Jan 9 16:32:35 2021 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started.
$ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid1 sdc2[1](W) rd0[0] 2094080 blocks super 1.2 [2/2] [UU] ** unused devices: <none>
The initialization time should be relatively quick here. Also, verify the RAID1 mirror details (Listing 13) and rerun the random write I/O test (Listing 14). As you saw with the NVMe drive earlier, you also see a small bump in random write operations at approximately 1.6MBps. Remember that you are only as fast as your slowest disk (i.e., the HDD paired with the ramdisk in the mirrored set).
Listing 13
Verify RAID1 Mirror Details
$ sudo mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Sat Jan 9 16:32:43 2021 Raid Level : raid1 Array Size : 2094080 (2045.00 MiB 2144.34 MB) Used Dev Size : 2094080 (2045.00 MiB 2144.34 MB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Sat Jan 9 16:32:54 2021 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : resync Name : dev-machine:0 (local to host dev-machine) UUID : 79387934:aaaad032:f56c6261:de230a86 Events : 17 Number Major Minor RaidDevice State 0 252 0 0 active sync /dev/rd0 1 8 34 1 active sync writemostly /dev/sdc2
Listing 14
Ramdisk Random Write
$ sudo fio --bs=4k --ioengine=libaio --iodepth=32 --size=10g --direct=1 --runtime=60 --filename=/dev/md0 --rw=randwrite --numjobs=1 --name=test test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 1 process Jobs: 1 (f=1): [w(1)][100.0%][w=1480KiB/s][w=370 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=5854: Sat Jan 9 16:34:44 2021 write: IOPS=395, BW=1581KiB/s (1618kB/s)(92.9MiB/60175msec); 0 zone resets [ ... ] Run status group 0 (all jobs): WRITE: bw=1581KiB/s (1618kB/s), 1581KiB/s-1581KiB/s (1618kB/s-1618kB/s), io=92.9MiB (97.4MB), run=60175-60175msec Disk stats (read/write): md0: ios=81/23777, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/11889, aggrmerge=0/0, aggrticks=0/958991, aggrin_queue=935342, aggrutil=99.13% sdc: ios=0/23778, merge=0/0, ticks=0/1917982, in_queue=1870684, util=99.13% rd0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
Now, run the random read test (Listing 15). I know I said wow before, but, Wow. Random reads are achieving greater than 1GBps throughput because it is literally only hitting RAM. On faster systems with faster memory, this number should be much larger.
Listing 15
Ramdisk Random Read
$ sudo fio --bs=4k --ioengine=libaio --iodepth=32 --size=10g --direct=1 --runtime=60 --filename=/dev/md0 --rw=randread --numjobs=1 --name=test test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 1 process Jobs: 1 (f=1) test: (groupid=0, jobs=1): err= 0: pid=5872: Sat Jan 9 16:35:08 2021 read: IOPS=251k, BW=979MiB/s (1026MB/s)(2045MiB/2089msec) [ ... ] Run status group 0 (all jobs): READ: bw=979MiB/s (1026MB/s), 979MiB/s-979MiB/s (1026MB/s-1026MB/s), io=2045MiB (2144MB), run=2089-2089msec Disk stats (read/write): md0: ios=475015/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% sdc: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% rd0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
This setup has a problem, though. It has only a single persistent (or non-volatile) volume in the mirror, and if that drive were to fail, only the volatile memory volume would be left. Also, if you reboot the system, you are in a degraded mode and reading solely from the HDD – until you recreate the ramdisk and rebuild the mirror, that is (which can be accomplished with simple Bash scripts on bootup).
How do you address this problem? A simple solution would be to add a second persistent volume into the mirror, creating a three-copy RAID1 array. If you recall in my earlier example, I created a 2GB partition on the second volumes that can be configured with mdadm
(Listing 16). When you verify the details (Listing 17), notice that both the HDDs are set to writemostly
.
Listing 16
Config 2GB Partition
$ sudo mdadm --create /dev/md0 --level=1 --raid-devices=3 /dev/rd0 --write-mostly /dev/sdc2 /dev/sdd2 mdadm: /dev/rd0 appears to be part of a raid array: level=raid1 devices=2 ctime=Sat Jan 9 16:32:43 2021 mdadm: Note: this array has metadata at the start and may not be suitable as a boot device. If you plan to store '/boot' on this device please ensure that your boot-loader understands md/v1.x metadata, or use --metadata=0.90 mdadm: /dev/sdc2 appears to be part of a raid array: level=raid1 devices=2 ctime=Sat Jan 9 16:32:43 2021 mdadm: /dev/sdd2 appears to be part of a raid array: level=raid1 devices=2 ctime=Sat Jan 9 16:32:43 2021 Continue creating array? (y/n) y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started.
Listing 17
Verify 2GB Details
$ sudo mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Sat Jan 9 16:36:18 2021 Raid Level : raid1 Array Size : 2094080 (2045.00 MiB 2144.34 MB) Used Dev Size : 2094080 (2045.00 MiB 2144.34 MB) Raid Devices : 3 Total Devices : 3 Persistence : Superblock is persistent Update Time : Sat Jan 9 16:36:21 2021 State : clean, resyncing Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Consistency Policy : resync Resync Status : 23% complete Name : dev-machine:0 (local to host dev-machine) UUID : e0e5d514:d2294825:45d9f09c:db485a0c Events : 3 Number Major Minor RaidDevice State 0 252 0 0 active sync /dev/rd0 1 8 34 1 active sync writemostly /dev/sdc2 2 8 50 2 active sync writemostly /dev/sdd2
Once the volume completes its initialization, do another run of fio
benchmarks and execute the random write test (Listing 18) and the random read test (Listing 19). The random writes are back down a bit to 1.3MBps as a result of writing to the extra HDD and the additional latencies introduced by the mechanical drive.
Listing 18
2GB Random Write
$ sudo fio --bs=4k --ioengine=libaio --iodepth=32 --size=10g --direct=1 --runtime=60 --filename=/dev/md0 --rw=randwrite --numjobs=1 --name=test test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 1 process Jobs: 1 (f=1): [w(1)][100.0%][w=1305KiB/s][w=326 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=5941: Sat Jan 9 16:38:30 2021 write: IOPS=325, BW=1301KiB/s (1333kB/s)(76.4MiB/60156msec); 0 zone resets [ ... ] Run status group 0 (all jobs): WRITE: bw=1301KiB/s (1333kB/s), 1301KiB/s-1301KiB/s (1333kB/s-1333kB/s), io=76.4MiB (80.2MB), run=60156-60156msec Disk stats (read/write): md0: ios=82/19571, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/13048, aggrmerge=0/0, aggrticks=0/797297, aggrin_queue=771080, aggrutil=97.84% sdd: ios=0/19572, merge=0/0, ticks=0/1658959, in_queue=1619688, util=93.01% sdc: ios=0/19572, merge=0/0, ticks=0/732934, in_queue=693552, util=97.84% rd0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
Listing 19
2GB Random Read
$ sudo fio --bs=4k --ioengine=libaio --iodepth=32 --size=10g --direct=1 --runtime=60 --filename=/dev/md0 --rw=randread --numjobs=1 --name=test test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 1 process Jobs: 1 (f=1) test: (groupid=0, jobs=1): err= 0: pid=5956: Sat Jan 9 16:38:53 2021 read: IOPS=256k, BW=998MiB/s (1047MB/s)(2045MiB/2049msec) [ ... ] Run status group 0 (all jobs): READ: bw=998MiB/s (1047MB/s), 998MiB/s-998MiB/s (1047MB/s-1047MB/s), io=2045MiB (2144MB), run=2049-2049msec Disk stats (read/write): md0: ios=484146/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% sdd: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% sdc: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% rd0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
Notice that the 1GBps random read throughput is still maintained, now with the security of an extra volume for protection in the event of a drive failure. However, you will still need to recreate the ramdisk and rebuild the mirrored set on every reboot.
Conclusion
As you can see, you can rely on age-old concepts such as RAID technologies to give you a boost of performance in your computing environments – and without relying on a temporary cache. In some cases, you can breathe new life into older hardware.
Infos
- "Tuning ZFS for Speed on Linux" by Petros Koutoupis, ADMIN, 57, 2020, pp. 44-46
- mdadm(8): https://www.man7.org/linux/man-pages/man8/mdadm.8.html
- The RapidDisk Project: https://github.com/pkoutoupis/rapiddisk
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.