What is an IOPS Really?
Many articles have explored the performance aspects of filesystems, storage systems, and storage devices. Classically, performance results are reported with statements such as Peak throughput is x MBps or Peak IOPS is x. However, what does “IOPS” really mean and how is it defined?
Typically, an IOP is an I/O operation, wherein data is either read or written to the filesystem and subsequently the storage device, although other IOPs exist that don’t strictly include a read or write I/O operation (more on that later).
The number of input/output operations per second (IOPS) sounds simple enough, but the term has no hard standard definition. For example, what I/O functions are used during IOPS testing? If the I/O functions involve reading or writing data, how much data is used for a read or write?
Despite not having a precise definition, IOPS is a very important storage performance measure for applications. Think about the serial portion of Amdahl’s Law, which typically includes I/O. Getting data to and from the storage device as quickly as possible affects application performance and scalability. With the large number of cores in today’s systems, either several applications run at the same time – all possibly performing I/O – or a running application uses a large number of processes, all possibly performing I/O. Storage performance is under even more pressure.
IOPS Specifics
The IOPS acronym implies that more I/O operations per second is better than fewer. The larger the IOPS, the better the storage performance. As a consequence, an important aspect of measuring IOPS is the size of the data used in the I/O function. (I use the terminology of the networking world and refer to this as the “payload size.”) Does the I/O operation involve just a single byte or does it involve 1MiB, 1GiB, 1TiB?
Most of the time, IOPS are reported as a plain number (e.g., 100,000). Because IOPS has no standard definition, the number is meaningless because it does not define the payload size. However, over time, an “accepted” payload has been created for measuring IOPS. This size is 4KB.
The kilobyte is defined as 1,000 bytes and is grounded in base 10 (10^3). Over time, kilobyte has been incorrectly used to mean numbers grounded in base 2, or 1,024 bytes (2^10). This usage is incorrect. The correct notation for 1,024 bytes is kibibyte (KiB). In this article I use the correct kibibyte unit notation to mean 1,024 bytes, so that 4KiB is 4,096 bytes. Therefore, the commonly used read or write payload is 4,096 bytes or 4KiB.
Just to emphasize the point, whereas 4KiB commonly is used for IOPS measurements, IOPS has no real definition, particularly for the payload size. If IOPS numbers are stated and they have no payload size associated with them, you cannot be sure what the number really means. The number could be 4KiB, but without a clear statement, you don’t know. I view this as a criminal storage offense (pay the bailiff on the way out).
The reason 4KiB is so commonly used is that it corresponds to the page size on almost all Linux systems and produces the best IOPS result (but not always). However, in light of no formal definition, it can be safe to say that not all reported IOPS numbers use this size. Sometimes, the results are reported with the use of 1KiB sizes – or even a 1-byte size.
Applications don’t always do I/O in 4KiB increments, so why should IOPS be restricted to a single payload size? In the absence of a definition, any payload size can be used. Personally, because applications use various payload sizes, I want to see IOPS measures for a range of I/O operation sizes. I like to see results with payload sizes of 1KiB (in case really small payload sizes have some exceptional performance); 4KiB, 32KiB, or 64KiB; and maybe even 128KiB, 256KiB, or 1MiB. The reason I like to see a range of payload sizes is that it allows me to compare it to the spectrum of payload sizes in my applications. However, if I must pick a single payload size, then it should be 4KiB. Most importantly, I want the publisher of any IOPS numbers to tell me the payload size they used.
IOPS Categories
Up to this point, the type of I/O operation has not been specified. Traditionally, IOPS are used to measure data throughput. Accordingly, IOPS are measured by read or write I/O functions, and many times two IOPS results are reported: one for only read operations and one for only write operations. Think of these as guardrail measurements that define the limit of read and write IOPS performance.
Again, because IOPS is not a defined standard, IOPS could be reported with a mixture of read and write operations. For example, it could be defined as a read/write mixture of 50/50 or 25/75. Because of the lack of a definition, whoever is reporting the IOPS results should define whether the number is an all-read, all-write, or mixture of operations. Personally, I would like to see all-read IOPS, all-write IOPS, and at least one read/write IOPS mixture (more is better).
IOPS reported as a mixture of read and write operations is a step forward, in my opinion, but reporting a mixed number can also be ambiguous. For example, does reporting the read/write IOPS as 25/75 mean the test did three write operations followed by a read operation? It could mean one read operation followed by three write operations or two write operations followed by one read operation and then another write operation. Without specifying the pattern, the mixed read/write IOPS measurement becomes uncertain, reducing the worth of reporting the number.
Although some applications perform all reads or all writes during portions of the execution, many applications use a mixture of reads and writes, so I want that mixed read/write IOPS result reported.
Personally, at a very minimum, I would like to see IOPS reported as:
- 4KiB read IOPS = x
- 4KiB write IOPS = y
- 4KiB (x% read/y% write) IOPS = w
The third IOPS measure should report the read/write pattern.
Beyond these three numbers, as I mentioned earlier, I want to see IOPS with different I/O function payload sizes. Although 4KiB is mandatory because it is the closest to a standard, I know of applications that do a great number of I/O functions in the 1–100-byte range. Knowing the same three IOPS measures – read, write, and a read/write mix – for a range of I/O payloads can be key to relating the IOPS numbers to application performance. The same is true for larger payloads, perhaps in the 32KiB to 1MiB range.
Diving deeper, a commonly overlooked aspect of measuring IOPS is whether the I/O operations are sequential or random. With sequential IOPS, the I/O operations happen with sequential blocks of data. For example, block 233 is used for the first I/O operation, followed by block 234, followed by block 235, and so on. (This discussion assumes the payload is 4KiB aligning with the blocks.) With random IOPS, the first I/O operation could be on block 233 and the second could be on block 8675309, or something like that.
With random I/O access, the tests can invalidate data caches because the blocks are not in the cache, resulting in IOPS measures that more realistic. If you run several applications at the same time, the blocks needed by each application may be widely separated on the storage device. One application might need something from one range of blocks, and another might use a different block range. Depending on the sequence of the I/O operations, to the filesystem and storage devices, this looks like random I/O access.
Keeping this in mind, the list of IOPS that should be reported is now:
- 4iKB random read IOPS = x
- 4iKB random write IOPS = y
- 4iKB sequential read IOPS = x
- 4iKB sequential write IOPS = y
- (Optional) 4iKB random (x% read/y% write) (sequential/random) IOPS = w
Now up to four IOPS numbers should be reported with a highly recommended fifth number that uses a mixture of read and write operations.
Beyond I/O function payload sizes and sequential or random data access, further options can be used to best the best possible IOPS results (e.g., a read-ahead cache in the operating system or a storage device). Reporting random access IOPS results can, one hopes, provide insight into what happens without cache effects.
Another option for tuning IOPS performance is queue depth, which is a measure of the number of I/O operations queued together before execution. A queue provides the opportunity for the operating system to order the I/O operations to make data access more sequential, but it does so by using more memory, a few more CPU cycles, and a delay before executing the I/O operations. Holding I/O operations in system memory can be a little dangerous, in that if the system loses power, those operations can be lost. However, the queues are flushed very quickly, so you might not be affected by the loss of power.
I commonly see varying queue depths for Windows IOPS testing. Linux does a pretty good job setting good queue depths, so there is much less need to change the default of 128 because it provides good overall performance. However, depending on the workload or the benchmark, you can adjust the queue depth to produce possibly better performance. Be warned that if you change the queue depth for a particular benchmark or workload, application performance beyond these specific applications could be affected.
You can check what I/O scheduler is being used and the corresponding queue depth by querying the /sys filesystem. An example of a mechanical disk (e.g., sdx) might be:
$ cat /sys/block/sd*/queue/scheduler [mq-deadline] none $ cat /sys/block/sd*/queue/nr_requests 64
or an NVME drive:
$ cat /sys/block/nvme1n1/queue/scheduler [none] mq-deadline $ cat /sys/block/nvme1n1/queue/nr_requests 1023
Because queue depth can be varied, the IOPS performance results could (should) be published with the queue depth:
- 4iKB random read IOPS = x (queue depth = z)
- 4iKB random write IOPS = y (queue depth = z)
- 4iKB sequential read IOPS = x (queue depth = z)
- 4iKB sequential write IOPS = y (queue depth = z)
- (Optional) 4iKB random (x% read/y% write IOPS = w (queue depth = z)
Another example of selecting options for better performance is the Linux I/O scheduler, which has the ability to sort the incoming I/O request into something called the request queues, where they are optimized for the best possible device access. Two types of I/O schedulers exist in recent kernels: multiqueue I/O schedulers and the non-multiqueue I/O schedulers.
The three non-multiqueue I/O schedulers are:
- Completely fair queuing (CFQ)
- Deadline
- NOOP (no-operation)
There are various reasons for using one over the other, but it is worthwhile to try all three to get the best IOPS performance, which, however, does not mean the best application performance, so be sure to test real applications when you change the scheduler.
With the advent of very fast storage devices (think solid-state storage), the storage bottleneck moved from the storage device to the operating system. These devices can achieve highly parallel access, improving I/O performance that requires a new way to think about how I/O operations are queued and scheduled. Multiqueue I/O schedulers were created for these cases.
More discussion about the specific multiqueue schedulers can be found online. The current list is:
- BFQ (budget fair queuing)
- Kyber
- None (like NOOP)
- mq-deadline (like a multiqueue deadline scheduler)
Note that some of these schedulers can be tuned for the best performance.
Including the specific I/O scheduler as part of the reported IOPS results makes the list of possible variations grow very quickly, which is a consequence of not having a standard IOPS definition. Overall, I would personally like to see the five IOPS numbers in the bullet list above, with the addition of the specific I/O scheduler. This report would give at least some upper bounds to IOPS performance. Furthermore, reporting IOPS for different I/O function payloads is important because it provides a wider view of IOPS results, and they can help in understanding application performance.
Total IOPS
Another IOPS measure I have come to value, which I refer to as “Total” IOPS, is the measure of I/O operations per second for non-read and non-write filesystem or storage operations. You could think of Total IOPS as “metadata” IOPS if read and write operations are not included. However, you could also include read and write operations so that you get the sum of all possible I/O operations.
Metadata operations include I/O functions such as stat, fstat, getdents, or fsync that don’t really have a “payload,” as do read or write functions. However, they all still affect performance because they can cause data to be read or written to storage. I consider them important to application performance because applications might call them in rapid succession. Note that the payloads for these metadata I/O functions is very small or non-existent, so to me, this is something very much like IOPS, hence the name, Total IOPS.
Currently, tests for metadata rates include mdtest, fdtree, postmark, and md-workbench. These benchmark codes test overall I/O functions such as creating and deleting, renaming, and gathering statistics. These tests are done in isolation and not in combination with each other, and sometimes they are run in a single directory or as part of a directory tree with files at each directory level.
The metadata operations that are tested in the previously mentioned benchmarks are limited to a very few scenarios. Nonetheless, applications use more than these, sometimes repeatedly throughout the application execution and sometimes even in rapid succession. Therefore, knowing how quickly these operations can be performed (i.e., IOPS) is an important part of testing filesystems, storage devices, and operating system parameters. Which I/O functions are called vary depending on the application and even the size of the data set. What is important is measuring or testing the IOPS associated with these metadata operations.
Measuring IOPS
Several tools are commonly used for measuring read/write IOPS on systems. The first one I want to mention is Iometer,which you commonly see used on Windows systems. The one I most commonly use is IOzone, an open source, easy to build and use tool that has a number of test options and even allows you to vary the data “compressibility” for I/O function payloads.
Another common tool for testing read/write IOPS is fio, which lets you run a wide variety of tests, including mixing read and write IOPS.
The last tool I want mention is IOR, which is commonly used in testing parallel storage solutions often found in HPC. It is also used as part of the IO500 list. You can use it to test really large amounts of I/O and vary the I/O function payload size, as well as either read or write operations.
Summary
IOPS is an important and often used I/O performance test. It can be very useful because many applications use very small I/O payloads, and executing them quickly improves application performance. However, although the storage world has sort of created a definition that is vague and ripe for abuse, there is no standard definition of an IOPS.
In this article, I hope I have explained what an IOPS is and what goes into its definition. Parameters such as the type, function payload size, and pattern of I/O operations, as well as the operating system I/O scheduler and queue depth, are critical when reporting and understanding IOPS results.