How to query sensors for helpful metrics

Measuring Up

Querying by SNMP

Many active components on the LAN support SNMP, which is rarely used today to manage the components actively, but it is excellent for querying and collecting metrics (Figure 2). SNMP protocol versions 1 and 2 are considered insecure. However, almost all devices with SNMP restrict the operation of these protocols to the LAN and only support read-only mode. With the net-snmp package installed on the system, Telegraf can read individual SNMP values or complete tables and forward them to InfluxDB. Which values these are usually depends on the manufacturer's own SNMP management information base (MIB). The MIB provides information on the existing object IDs (OIDs) and their addresses. Detailed documentation of custom SNMP MIBs can be found on the vendor websites.

Figure 2: Sensor data is being fed by SNMP into the metrics dashboard from the online UPS. Unlike a standby UPS, the double converter filters out all mains fluctuations.

In the example with a UPS by APC (Schneider Electric), a suitable configuration for the manufacturer's custom MIB is shown in Listing 2. The first inputs.snmp entry opens the UDP connection to the management adapter. The inputs.snmp entries that follow gather information on the battery temperature and the input and output voltages. You can also field similar metrics from switches or routers this way.

Listing 2

UPS Metrics

[[inputs.snmp]]
   agents = ["<IP address of the UPS management boards>:161"]
   version = 2
   community = "public"
   name = "snmp"
[[inputs.snmp.field]]
   name = "Battery Temperature"
   oid = "1.3.6.1.4.1. 318.1.1.1.2.2.2.0"
[[inputs.snmp.field]]
   name = "PowerIN"
   oid = "1.3.6.1.4.1.318.1.1.1.3.2.1.0"
[[inputs.snmp.field]]
   name = "PowerOUT"
   oid = "1.3.6.1.4.1. 318.1.1.1.4.2.1.0"

Collecting Data with Telegraf

You have many different ways to retrieve sensor data from the tools and send the data to Influx. For example, you can write your own scripts that either communicate directly with the InfluxDB API on port 8068 of the InfluxDB server or send queries by the InfluxDB client.

In most cases, however, metric collectors such as Telegraf or Collectd are used on the systems. Both come with input modules that directly support all of the sensor sources presented above. In the Telegraf configuration, before the input modules, the /etc/telefgraf/telegraf.conf file is preceded by the output. In line with the example, you want the host to be monitored to deliver its data to both the InfluxDB 1.8 and 2.4 hosts. For Influx 1.8, that entry would be:

[[outputs.influxdb]]
   url = "http://192.168.2.81:8086"
   database = "telegraf"

Things are a little more complicated with InfluxDB 2.4: You must first create an API token, the organization, and a bucket in the basic configuration of the InfluxDB pod:

[[outputs.influxdb_v2]]
   urls = ["http://192.168.2.82:8086"]
   token = "<token>"
   organization = "<organization>"
   bucket = "<bucket>"

If you specify both outputs in your telegraf.conf file, Telegraf will also send all metrics to both Influx pods. Alternatively, you could use Prometheus instead of InfluxDB. To avoid going beyond the scope of the workshop, I will not look at this option at this point. You can also append the input sources directly in /etc/telegraf/telegraf.conf.

For debugging purposes, however, it is a good idea to create a separate CONF file for each input in /etc/telegraf/telegraf.d/, then you can very easily type

telegraf --test --config /etc/telegtaf/telegraf.d/<sensor>.conf

to test the output of the sensor sources individually and check the supplied data.

To use lm-sensors , you simply need a CONF file named sensors.conf with a one-liner: [[inputs.sensors]]. Of course, you do need to have installed and configured the lm_sensors toolset on the system up front. The same is true for IPMI input. The [[inputs. ipmi_sensor]] line in the appropriate CONF file is all you need.

Things get a little more complex with SMART or hddtemp. Both tools need root privileges, but the Telegraf client works in userspace and therefore cannot access the devices in /dev. The hddtemp tool works around the problem in an elegant way with its own daemon, which you can enable by typing

systemctl enable hddtemp --now

The daemon then serves up the queried disk temperatures in userspace over a simple TCP port on localhost:7634 . With the hddtemp service running, the one-liner [[inputs.hddtemp]] in the hddtemp.conf file is all you need.

SMART, on the other hand, does not work around root privileges. To use this input, you need to give the telegraf account root privileges for the smartctl tool by creating a /etc/sudoers.d/telegraf file with the following content:

telegraf ALL = NOPASSWD:/usr/sbin/smartctl

You can then enable the input in the Telegraf smart.conf configuration file:

[[inputs.smart]] use_sudo = true

Unfortunately, smartctl caused problems in my various lab setups, especially on the HP microserver with a low-powered Celeron CPU. The SMART input caused extremely high CPU loads, affecting the running services. Only hddtemp is used on this system.

To retrieve the values of a UPS managed by NUT, you need an IP connection to the NUT server and a suitable monitoring user, including a password, which you have defined on the NUT server in /etc/ups/upsd.user. The entry in the /etc/telegraf/telegraf.d/ups.conf file is then:

[[inputs.upsd]]
   server = <server IP address>
   port = 3493
   username = "<upd-user>"
   password = "<ups-user-password>"

If NUT and Telegraf are running on the same host, leave out the server line.

Telegraf Without a Plugin

The choice of plugins for Telegraf is huge, but you could still be confronted with proprietary metrics sources of which Telegraf is not aware. You could, on the one hand, use a script that communicates directly with the InfluxDB API, as mentioned at the beginning. However, if you want to send data by Telegraf only, you can use its [[input.exec]] option instead. Telegraf then runs a given script (Bash, Python, Perl, etc.) and routes the data back to InfluxDB. The input.exec file expects the response data in the <name> = <value> format; for example:

room temperature kitchen=20

A matching entry in the Telegraf configuration would then be:

[[inputs.exec]]
   commands = ["sh /etc/telegraf/script/script1.sh"
   ]
   timeout = "5s"
   data_format = "influx"

It is important for Telegraf to execute the custom script in its userspace (i.e., with rights of the telegraf user). Here, users sometimes get into a spin when testing their inputs. When running

telegraf --test

with root privileges, everything works, whereas the Telegraf daemon running in the telegraf user context suddenly stops delivering values. Make sure you run your tests in the right context.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus