Tuning SSD RAID for optimal performance

Flash Stack

Overprovisioning

As already mentioned, the size of the spare area has a direct effect on the random write performance of an SSD. Even a small increase in the spare area will significantly increase the random write performance and durability of an SSD. This recommendation from the early days of the SSD era is still valid today. For the current crop of SSDs, however, durability is already sufficient to make an increase in the spare area useful for performance reasons.

You can easily increase the size of the spare area with manual overprovisioning by only configuring part of the capacity of the SSD for the RAID set (about 80 percent) and leaving the rest blank. If you have previously used the SSD, perform a secure erase first to wipe all the flash cells. Only then can the SSD controller use this free space as a spare area. As Figure 5 shows, write IOPS almost tripled given 20 percent overprovisioning (with an Intel DC S3500 800GB SSD).

Figure 5: In an Intel DC S3500 800 GB SSD with 20 percent overprovisioning, the potential write IOPS performance increases from 12,000 IOPS to 33,000 IOPS; at 40 percent, this is even 47,000 IOPS. For read-only access, the performance remains unchanged.

A little note on the side: Overprovisioning does not affect the read-only or sequential write performance; that is, performance benefits are only up for grabs if you have a random mix of read/write or write-only workloads.

Linux Deadline I/O Scheduler

For SSDs and SSD RAID the deadline I/O scheduler offers the best performance in my experience. As of Ubuntu 12.10, Ubuntu uses the deadline scheduler by default; other distributions still partially rely on the CFQ scheduler, which was optimized with hard disks in mind. Typing a simple

cat /sys/block/sda/queue/scheduler

lets you check which scheduler is used (for the sda device) in this case. Working as root, you can set the scheduler to deadline scheduler like this:

echo deadline >   /sys/block/sda/queue/scheduler

For details of permanently changing the scheduler, see the documentation of the Linux distribution that you use. In the medium term, the Linux multiqueue block I/O queueing mechanism (blk-mq ) will replace the traditional I/O schedulers for SSDs. But, it will take some time for blk-mg to find its way into the long-term support enterprise distributions.

Monitoring the Health of SSDs and RAID

The flash cells in an SSD can only handle a limited number of write cycles. The finite lifetime is attributable to the internal structure of the memory cells, which are exposed to program/erase (P/E) cycles during write access. The floating gate of the cell, which is used to write to the cell, wears in each cycle. If the wear threshold of a certain cell is exceeded, the SSD's controller tags the cell as a bad block and replaces it with one from the spare area. At least two indicators are derived directly from this function: (1) the wear of the flash cells via the media wearout indicator and (2) the number of remaining spare blocks (a.k.a. available reserved space).

Optimally, the manufacturer passes these two values on to the user as self-monitoring analysis and reporting technology (SMART) attributes. For proper monitoring of the attributes, a detailed SMART specification of the SSD is essential. The interpretation of these values is not standardized and varies from manufacturer to manufacturer. As an example, Intel and Samsung attributes differ in terms of ID and name, but the values they use are at least equal. Using the smartctl command-line tool, you can access SMART attributes for your SSD; this is also possible for megaraid and sg devices residing behind Avago MegaRAID and Adaptec controllers,

smartctl -a /dev/sda
smartctl -a -d megaraid,6 /dev/sda
smartctl -a -d sat /dev/sg1

illustrating how important it is that the manufacturers disclose their specifications for reliable monitoring. Intel is exemplary in its datacenter SSDs and provides detailed specifications, including SMART attributes.

The integration of SMART monitoring into a monitoring framework like Icinga is the next step. A plugin must be able to respond to the manufacturer's specification and interpret the attribute values correctly. The Thomas Krenn team has developed check_smart_attributes [4], a plugin that maps the attribute specifications to a JSON database. As of version 1.1, the plugin also monitors SSDs residing behind RAID controllers.

SMART attributes uniquely determine the wear of flash cells, from which the durability of the SSD is derived. Other attributes complete the SSD health check for enterprise use. At Intel, one example is the Power_Loss_Cap_Test attribute, which relates to the correct function of the SSD's integrated cache protection. Cache protection ensures that no data is lost in a power outage. This attribute shows the importance of regular testing of SSD SMART attributes. You need to make sure detailed information for the SMART attributes exists before you deploy SSDs on a larger scale.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus