« Previous 1 2 3 4 Next »
How to query sensors for helpful metrics
Measuring Up
Querying by SNMP
Many active components on the LAN support SNMP, which is rarely used today to manage the components actively, but it is excellent for querying and collecting metrics (Figure 2). SNMP protocol versions 1 and 2 are considered insecure. However, almost all devices with SNMP restrict the operation of these protocols to the LAN and only support read-only mode. With the net-snmp package installed on the system, Telegraf can read individual SNMP values or complete tables and forward them to InfluxDB. Which values these are usually depends on the manufacturer's own SNMP management information base (MIB). The MIB provides information on the existing object IDs (OIDs) and their addresses. Detailed documentation of custom SNMP MIBs can be found on the vendor websites.
In the example with a UPS by APC (Schneider Electric), a suitable configuration for the manufacturer's custom MIB is shown in Listing 2. The first inputs.snmp
entry opens the UDP connection to the management adapter. The inputs.snmp
entries that follow gather information on the battery temperature and the input and output voltages. You can also field similar metrics from switches or routers this way.
Listing 2
UPS Metrics
[[inputs.snmp]] agents = ["<IP address of the UPS management boards>:161"] version = 2 community = "public" name = "snmp" [[inputs.snmp.field]] name = "Battery Temperature" oid = "1.3.6.1.4.1. 318.1.1.1.2.2.2.0" [[inputs.snmp.field]] name = "PowerIN" oid = "1.3.6.1.4.1.318.1.1.1.3.2.1.0" [[inputs.snmp.field]] name = "PowerOUT" oid = "1.3.6.1.4.1. 318.1.1.1.4.2.1.0"
Collecting Data with Telegraf
You have many different ways to retrieve sensor data from the tools and send the data to Influx. For example, you can write your own scripts that either communicate directly with the InfluxDB API on port 8068 of the InfluxDB server or send queries by the InfluxDB client.
In most cases, however, metric collectors such as Telegraf or Collectd are used on the systems. Both come with input modules that directly support all of the sensor sources presented above. In the Telegraf configuration, before the input modules, the /etc/telefgraf/telegraf.conf
file is preceded by the output. In line with the example, you want the host to be monitored to deliver its data to both the InfluxDB 1.8 and 2.4 hosts. For Influx 1.8, that entry would be:
[[outputs.influxdb]] url = "http://192.168.2.81:8086" database = "telegraf"
Things are a little more complicated with InfluxDB 2.4: You must first create an API token, the organization, and a bucket in the basic configuration of the InfluxDB pod:
[[outputs.influxdb_v2]] urls = ["http://192.168.2.82:8086"] token = "<token>" organization = "<organization>" bucket = "<bucket>"
If you specify both outputs in your telegraf.conf
file, Telegraf will also send all metrics to both Influx pods. Alternatively, you could use Prometheus instead of InfluxDB. To avoid going beyond the scope of the workshop, I will not look at this option at this point. You can also append the input sources directly in /etc/telegraf/telegraf.conf
.
For debugging purposes, however, it is a good idea to create a separate CONF file for each input in /etc/telegraf/telegraf.d/
, then you can very easily type
telegraf --test --config /etc/telegtaf/telegraf.d/<sensor>.conf
to test the output of the sensor sources individually and check the supplied data.
To use lm-sensors
, you simply need a CONF file named sensors.conf
with a one-liner: [[inputs.sensors]]
. Of course, you do need to have installed and configured the lm_sensors
toolset on the system up front. The same is true for IPMI input. The [[inputs. ipmi_sensor]]
line in the appropriate CONF file is all you need.
Things get a little more complex with SMART or hddtemp
. Both tools need root privileges, but the Telegraf client works in userspace and therefore cannot access the devices in /dev
. The hddtemp
tool works around the problem in an elegant way with its own daemon, which you can enable by typing
systemctl enable hddtemp --now
The daemon then serves up the queried disk temperatures in userspace over a simple TCP port on localhost:7634
. With the hddtemp
service running, the one-liner [[inputs.hddtemp]]
in the hddtemp.conf
file is all you need.
SMART, on the other hand, does not work around root privileges. To use this input, you need to give the telegraf
account root privileges for the smartctl
tool by creating a /etc/sudoers.d/telegraf
file with the following content:
telegraf ALL = NOPASSWD:/usr/sbin/smartctl
You can then enable the input in the Telegraf smart.conf
configuration file:
[[inputs.smart]] use_sudo = true
Unfortunately, smartctl
caused problems in my various lab setups, especially on the HP microserver with a low-powered Celeron CPU. The SMART input caused extremely high CPU loads, affecting the running services. Only hddtemp
is used on this system.
To retrieve the values of a UPS managed by NUT, you need an IP connection to the NUT server and a suitable monitoring user, including a password, which you have defined on the NUT server in /etc/ups/upsd.user
. The entry in the /etc/telegraf/telegraf.d/ups.conf
file is then:
[[inputs.upsd]] server = <server IP address> port = 3493 username = "<upd-user>" password = "<ups-user-password>"
If NUT and Telegraf are running on the same host, leave out the server
line.
Telegraf Without a Plugin
The choice of plugins for Telegraf is huge, but you could still be confronted with proprietary metrics sources of which Telegraf is not aware. You could, on the one hand, use a script that communicates directly with the InfluxDB API, as mentioned at the beginning. However, if you want to send data by Telegraf only, you can use its [[input.exec]]
option instead. Telegraf then runs a given script (Bash, Python, Perl, etc.) and routes the data back to InfluxDB. The input.exec
file expects the response data in the <name> = <value>
format; for example:
room temperature kitchen=20
A matching entry in the Telegraf configuration would then be:
[[inputs.exec]] commands = ["sh /etc/telegraf/script/script1.sh" ] timeout = "5s" data_format = "influx"
It is important for Telegraf to execute the custom script in its userspace (i.e., with rights of the telegraf user). Here, users sometimes get into a spin when testing their inputs. When running
telegraf --test
with root privileges, everything works, whereas the Telegraf daemon running in the telegraf user context suddenly stops delivering values. Make sure you run your tests in the right context.
« Previous 1 2 3 4 Next »
Buy this article as PDF
(incl. VAT)