How to query sensors for helpful metrics

Measuring Up

Metrics dashboards in Grafana or similar tools show you how your infrastructure is performing. Querying sensors gives you additional information (e.g., voltages, temperatures, and fan speeds) that you can use to analyze and protect against failures. In this article, I look at the tools you can use to query sensors.

Previous articles looked into the use of tools such as InfluxDB, Telegraf, or Collectd to help retrieve metrics from running systems and visualize the results in Grafana. Thus far, however, the articles have focused on the basic software architecture of a TICK stack [1] (i.e., Telegraf, InfluxDB, Chronograf, Kapacitor; alternatively, TIG [2] with Grafana instead of Chronograf, or CIG with Collectd instead of Telegraf) and how to use it to collect performance data from your environment. In other words, you already know the utilization level of your mass storage media and the kind of performance the CPUs deliver. Besides plain vanilla performance data such as system load, it is a good idea to keep track of other data such as temperatures and voltages, because these have a significant influence on the durability of the server components.

Of course, you could install external sensors and temperature probes in your data center and query them, but before you invest in external devices, it's worth taking a look at the sensors that already exist in the system and discovering how to query their information and add them to your metrics dashboard. This approach is also recommended for users who use rented servers from a hardware provider and do not have physical access to their data centers.

Sensor monitoring can turn up some surprises: For example, when the disk temperatures in a RAID array reveal that one of four disks is permanently 20 degrees warmer than all the others, it explains why the drive failed twice within a year and had to be

...

Use Express-Checkout link below to read the full article (PDF).