Lead Image © Sk Design, Fotolia.com

SMART storage device monitoring

Distress Signals

Article from ADMIN 59/2020

By Jeff Layton

Most storage devices have SMART capability, but can it help you predict failure? We look at ways to take advantage of this built-in monitoring technology with the smartctl utility from the Linux smartmontools package.

S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) is a monitoring system for storage devices that provides information about the status of a device and allows for the running of self-tests. Administrators can use it to check on the status of their storage devices and periodically run self-tests to determine the state of the device.

IBM was the first company to add some monitoring and information capability to their drives in 1992. Other vendors followed suit, and Compaq led an effort to standardize the approach to monitoring drive health and reporting it. This push for standardization led to S.M.A.R.T. [1] (Although S.M.A.R.T. is the correct abbreviation, it's not nearly as easy to type, so I will be using SMART throughout the remainder of the article.)

Over time, SMART capability has been added to many drives, including PATA, SATA, and the many varieties of SCSI, SAS, and solid-state drives, as well as NVM Express (commonly referred to as NVMe) and even eMMC drives. The standard provides that the drives measure the appropriate health parameters and then make the results available for the operating system or other monitoring tools. However, each drive vendor is free to decide which parameters are to be monitored and their thresholds (i.e., the points at which the drive has "failed"). Note that I use "drive" as a generic term for a storage device in this article.

For a drive to be considered "SMART," all it has to have is the ability to signal between the internal drive sensors and the host computer. Nothing in the standard defines what sensors are in the drive or how the data is exposed to the user. However, at the lowest level, SMART provides a simple binary bit of information – the drive is OK or the drive has failed. This bit of information is called the SMART status. Many times the output DISK FAILING doesn't indicate that the drive

...

Use Express-Checkout link below to read the full article (PDF).