Linux Storage Stack
Stacking Up
In the storage stack context, the end users are typically normal applications (userspace programs/applications). The first component with which Linux programs interact when processing data is the virtual filesystem (VFS). Only through the VFS is it possible to invoke the same system calls for different filesystems on different media. Using VFS, for example, a file is transparently copied for the user from an ext4 to an ext3 filesystem using the cp
command.
The variety of filesystems – block-based, cross-network, pseudo-, and even Filesystems in Userspace (FUSE) – demonstrates the numerous possibilities that VFS encapsulation opens up. The aforementioned system calls are unified functions, such as open
, read
, or write
, no matter what filesystem is hidden underneath. The specific filesystem operations are abstracted by the VFS, and caches – including the directory entry (dentry) cache – speed up file access.
The next layer of the storage stack consists of individual filesystem implementations. They provide the VFS with generic methods and translate them into specific calls for accessing the device. The filesystem also performs its primary task – organizing data and metadata for an underlying storage medium. The Linux kernel also speeds up access to these media with a caching mechanism – the Linux page cache.
Block I/O
Flexible block I/O structures (BIOs) are used instead of pages for administration in the kernel. The structures represent block I/O operations or queries that the kernel is currently executing (in-flight BIOs). This applies both to I/O on the page cache and to direct I/O (i.e., access that bypasses the page cache). The advantages of BIOs is in handling multiple segments involved in the current I/O operation.
BIOs consist of a list or a vector of segments that points to different pages in memory. This results in a dynamic allocation of pages to the sectors on the block device via block I/O. BIOs come up again with respect to the block layer. BIOs are grouped into requests before the kernel passes I/O operations to the driver's dispatch queue.
Stacked Block Devices
An important component of the Linux Storage Stack is located in front of the block layer, where logical block devices are implemented. They may operate with conventional devices, but they also perform additional functions, such as layering logical devices on top of each other (i.e., stacked devices). The Logical Volume Manager (LVM) and software RAID (mdraid) are probably the best known representatives. For example, they allow you to scale logical volumes beyond the limitation of physical disks or to store data redundantly. Stacking is often used in RAID and LVM combination disks. As you can see, you have to go through several steps before getting to the block layer.
Block Layer
The block layer processes the BIOs described above and is responsible for forwarding application I/O requests to the storage devices. It provides the applications with uniform interfaces for data access, whether this involves a disk, an SSD, Fibre Channel (FC) SAN, or other block-based storage device. The block layer also provides a uniform access point to all applications for memory devices and their drivers. In this way, it conceals the complexity and diversity of storage devices from applications.
The block layer also ensures an equitable distribution of I/O access, appropriate error handling, statistics, and a scheduling system. The latter is particularly intended to improve performance and to protect users from performance disadvantages caused by poorly implemented applications or drivers.
Depending on the respective device drivers and configuration, the data can take three paths in or around the block layer:
- Via one of the three traditional I/O schedulers: NOOP, Deadline, or CFQ. The formerly fourth traditional I/O scheduler (i.e., the anticipatory scheduler, or AS) was similar to CFQ and was therefore removed in kernel version 2.6.33.
- Via the Linux multiqueue block I/O queuing mechanism (blk-mq ). Introduced with Linux kernel 3.13, blk-mq is relatively new. The Linux kernel developers mainly developed it for high-performance flash memory, such as PCIe SSDs, to break through the IOPS (I/O operations per second) performance limitations of traditional I/O schedulers.
- Diverting around the block layer directly to BIO-based drivers. In the period before blk-mq, manufacturers of PCIe SSDs developed these elaborate drivers, which have to implement a lot of the block layer tasks themselves.
Buy this article as PDF
(incl. VAT)