How to query sensors for helpful metrics
Measuring Up
Metrics dashboards in Grafana or similar tools show you how your infrastructure is performing. Querying sensors gives you additional information (e.g., voltages, temperatures, and fan speeds) that you can use to analyze and protect against failures. In this article, I look at the tools you can use to query sensors.
Previous articles looked into the use of tools such as InfluxDB, Telegraf, or Collectd to help retrieve metrics from running systems and visualize the results in Grafana. Thus far, however, the articles have focused on the basic software architecture of a TICK stack [1] (i.e., Telegraf, InfluxDB, Chronograf, Kapacitor; alternatively, TIG [2] with Grafana instead of Chronograf, or CIG with Collectd instead of Telegraf) and how to use it to collect performance data from your environment. In other words, you already know the utilization level of your mass storage media and the kind of performance the CPUs deliver. Besides plain vanilla performance data such as system load, it is a good idea to keep track of other data such as temperatures and voltages, because these have a significant influence on the durability of the server components.
Of course, you could install external sensors and temperature probes in your data center and query them, but before you invest in external devices, it's worth taking a look at the sensors that already exist in the system and discovering how to query their information and add them to your metrics dashboard. This approach is also recommended for users who use rented servers from a hardware provider and do not have physical access to their data centers.
Sensor monitoring can turn up some surprises: For example, when the disk temperatures in a RAID array reveal that one of four disks is permanently 20 degrees warmer than all the others, it explains why the drive failed twice within a year and had to be replaced. Because the user had the recorded metrics, they were given a replacement machine.
If you do not run a TIG stack in your data center or on your cloud server, you can employ a simple solution that relies on containers with Podman.
TIG with Podman
An environment with Grafana and InfluxDB can be set up very quickly with containers. Systemd handles the management tasks; the containers act like system services and are automatically started by systemctl
and enabled directly after the system boots. In this scenario, the components operate independently and on their own IP addresses. Therefore, you can try out different constellations and use several versions of InfluxDB at the same time.
For the containers to work with their own IP addresses, you need a server with a network bridge and a Podman network of the Bridge [3] type. You will find various how-tos online to help you create a bridged container network of the macvlan type. However, if you use this type of network, the containers cannot communicate with the host itself. In this test setup, the bridge LAN is named pub_net
and is described by the /etc/cni/net.d/pub_net.conflist
file. The setup runs on a server with RHEL 8, CentOS 8 Stream, or an Enterprise Linux 8 (EL8) clone and with the Podman package installed.
InfluxDB with Flux
Version 2 of InfluxDB introduced a number of significant changes, including the Flux query language as the default. Many users still prefer InfluxQL version 1.x, which also makes integration with Grafana far easier. Unfortunately, Grafana still does not offer a convenient graphical query editor for Flux. By the way, a simple trick will help Flux newcomers. The new InfluxDB web user interface (UI) provides a graphical query editor itself. Build your query there and then copy the resulting Flux code into the Grafana panel.
The setup with Podman allows for worry-free use of both variants in this workshop, which means you can take your time and get to know both versions before deciding which one you want to use in production. For Influx version 1.8, create a systemd service in the /etc/system/system/flux18.service
file following the template in Listing 1.
Listing 1
Influx v1.8 systemd Service
[Unit] Description=Influxdb 1.8 After=network-online.target Wants=network-online.target [Service] ExecStartPre=mkdir -p /var/pods/flux18/etc ExecStartPre=mkdir -p /var/pods/flux18/data ExecStartPre=-/bin/podman kill flux18 ExecStartPre=-/bin/podman rm flux18 ExecStartPre=-/bin/podman pull docker.io/influxdb:1.8 ExecStart=/bin/podman run --name flux18 --volume /var/pods/flux18/etc:/etc/influxdb:Z --volume /var/pods/flux18/data:/var/lib/influxdb:Z --net pub_net --ip 192.168.2.81 --mac-address 52:54:C0:A8:02:51 \docker.io/influxdb:1.8 ExecStop=/bin/podman stop flux18 [Install] WantedBy=multi-user.target
Now when you type
systemctl start flux18
systemd creates two directories under /var/pods/flux18
where the container will store the configuration (etc
) and the data (data
), which means the information is retained on the host system even after the container is stopped or restarted. You can define the container's IP address and the MAC address statically. For tests with InfluxDB 2.0, create an /etc/system/system/flux24.service
file with exactly the same template. Change the IP and MAC address (here, to .82
and :52
), swap flux18
in the container name and directories for flux24
, then change the two lines in the Docker template from docker.io/influxdb:1.8
to docker.io/influxdb:2.4
. Because InfluxDB 2.0 by default no longer allows access without a login, you need to click on http://192.168.2.82:8086
in the browser after starting the container and run through the basic setup wizard. Create two API tokens for Telegraf and Grafana at the same time.
Following the pattern, create a service declaration for your Grafana container. The name is grafana
; the image points to docker.io/grafana/grafana
. The two directories for the Grafana container are logs
and data
:
--volume /var/pods/grafana/data:/var/lib/grafana:Z --volume /var/pods/grafana/logs:/var/log/grafana:Z
Moreover, you can specify at the start of the file that you want systemd to start the Grafana container after the Influx container at system boot:
After=network-online.target flux18.service
In the test setup, Grafana has an IP address of .80
and a MAC address of :50
. With Grafana, too, you first need to go through the initial setup in the browser at http://192.168.2.80:3000
.
Sensors in the System
Depending on your hardware, most motherboards and chipsets already have a whole range of built-in sensors that can be queried by the operating system. The toolset for this is the Linux management sensors (lm-sensors suite) and can be set up on Enterprise Linux systems by typing
yum install lm_sensors
(or using dnf
). To help you with existing sensors, the package includes a sensors-detect
tool that detects the chipset sensors, the I2C buses, and the sensors attached to them; it writes the initial module configuration to the /etc/sysconfig/lm_sensors
file. A call to the sensors
tool lists the sensors and their values. Depending on the system, some of them might not be usable. In some places, lm-sensors
finds sensors that do not even exist in the system, often because the hardware manufacturer has installed a connector and a controller, but not the sensor itself. Of course, these fake sensors are immediately apparent, because case temperatures of 0 or -273 degrees are pretty unlikely.
In practice, however, sensors will at least find the core package on almost all systems, which means the temperature information for the CPU chip and all CPU cores it contains. Already this gives you some important insights and is especially important for passively cooled edge devices or servers with a small form factor (e.g., Intel NUC). If the system does not have any fan speed sensors, the CPU temperature at least lets you draw conclusions about fan problems (Figure 1).
Buy this article as PDF
(incl. VAT)