« Previous 1 2 3 4
How to query sensors for helpful metrics
Measuring Up
Collectd Directly or with Detours
As a metric collector, Collectd takes a virtually identical approach to Telegraf, at least if you use InfluxDB version 1.8. This version accepts metrics from Collectd directly. In this example, you will find the InfluxDB configuration for the InfluxDB 1.8 container in the /var/pods/flux18/etc/influxdb.conf
directory on the host:
[[collectd]] enabled = true bind-address = "192.168.2.81:25826" database = "collectd" typesdb = "/usr/share/collectd/types.db"
As of version 2, InfluxDB no longer has direct Collectd input. If you collect metrics with this tool, you have to detour by way of a Telegraf instance that is configured as,
[[inputs.socket_listener]] service_address = "udp://:25826" data_format = "collectd"
which is followed by [[outputs.influxdb_v2]]
, as described earlier in the text. Collectd provides all of the functions mentioned so far for inputs. A config example in /etc/collectd.conf
is
LoadPlugin hddtemp <Plugin hddtemp> Host "127.0.0.1" Port "7634" </Plugin> <Plugin network> Server "192.168.2.81" "25826" </Plugin>
in which Collectd queries hddtemp
and returns the values to the Influx 1.8 server.
Conclusions
The plain old performance metrics of your server and cloud systems appear in a different context when you have access to sensor data. Suddenly, you can link performance drops of a server to an overheated CPU and perhaps even find the root of this problem in the form of a fan that is too slow. You will then also see disk failures in the context of power fluctuations or surges.
Besides plain vanilla monitoring and visualization of sensor data, you can also link the data to alerting and system management. If the intake temperature of any of your servers rises to above 40 degrees, you probably have a problem with the air conditioning in the server facility, and you will want to shut down all the systems – or at least the less important ones – to protect your hardware investment.
Infos
- Monitoring, alerting, and trending with the TICK Stack: https://www.admin-magazine.com/Archive/2018/47/Monitoring-alerting-and-trending-with-the-TICK-Stack/
- Storage monitoring with Grafana: https://www.admin-magazine.com/Archive/2019/54/Storage-monitoring-with-Grafana/
- Podman network: https://docs.podman.io/en/latest/markdown/podman-network-create.1.html
« Previous 1 2 3 4
Buy this article as PDF
(incl. VAT)