Lead Image © Levente Janos, 123RF.com

Lead Image © Levente Janos, 123RF.com

Admin as a service with sysstat for ex-post monitoring

Facts, Figures, and Data

Article from ADMIN 81/2024
By
The top, vmstat, and iotop monitoring tools for Linux show the current situation but do not provide any information about the past – which is where sysstat comes into play.

The sysstat package contains numerous tools for monitoring performance and load states on systems, including iostat, mpstat, and pidstat. In addition to tools intended for ad hoc use, sysstat comes with a collection of utilities that you schedule for periodic use as cron jobs (e.g., sar). The compiled statistics range from I/O transfer rates, CPU and memory utilization, paging and page faults, network statistics, and filesystem utilization through NFS server and client activities.

Of course, you could use top, vmstat, ss, and so on to determine the data, but bear in mind that system events are hardly likely to be restricted to the times at which you are sitting in front of the screen. Admins typically receive requests for more detailed information, like: "What exactly happened on system XY between 1:23 and 1:37am?" Many sensors in modern monitoring systems are capable of detecting these anomalies, but very few IT departments are likely to have configured comprehensive monitoring for all systems. Moreover, these metrics quite commonly are not implemented in a central monitoring setup. Sysstat gives you the perfect toolbox for these cases.

Installation

To install sysstat under Ubuntu and Debian, just run:

sudo apt install sysstat

By default, sysstat checks the system every 10 minutes with the use of a systemd timer. If you require more frequent measurements, you need to adjust the interval in the /usr/lib/systemd/system/sysstat-collect.timer file. You can set a scan every five minutes by entering *:00/05 instead of the default setting *:00/10:

[Unit]
Description=Run system activity accounting tool every -10- 5 minutes
[Timer]
OnCalendar=*:00/05
[Install]
WantedBy=sysstat.service

Next, tell systemd that the configuration has changed by typing:

systemctl daemon-reload

To enable and start monitoring with sar, you also need to open the /etc/default/sysstat file and change the ENABLED="false" setting to ENABLED="true". You can enable automatic launching of the service at boot time and launch it directly with:

systemctl enable sysstat
systemctl start sysstat

A call to

systemctl status sysstat

should now return an active message.

sysstat at Work

Sysstat writes the measured values in a binary-encoded format to files in the /var/log/sysstat/ folder. The utility has a file-naming scheme that uses the day (i.e., sa<DD> with the -o option) or the year, month, day (i.e., sa<YYYYMMDD> with the -D option) and always accesses the current file when reading.

If you launch sar -q, the tool displays a brief overview of the configuration. The overview contains a line with system data such as kernel, hostname, and date. Another line indicates when sysstat-collect.timer was last restarted. Finally, one line per sample shows measured values (e.g., the size of the run queue and the process list) as well as three calculated values for load average.

Ex-Post Analysis

In this article, I focus on accessing ex-post data (i.e., the state of a system in the past). To access the data for the last two days, sar offers the -1 (yesterday) and -2 (day before yesterday) switches. Regardless of the sensor in which you are interested, sar -1 gives you the previous day's data.

If the measurement data is from further back, you have no choice but to specify the file names with the -f argument. If you explicitly want to view the statistics from January 8, you would define the source file as shown in the first line of Listing 1.

Listing 1

Analysis with sar

01 $ sar -f /var/log/sysstat/sa08A [...]
02 $ sar -f /var/log/sysstat/sa08 -s 23:30:00 -e 23:55:00 [...]
03 $ sar -f /var/log/sysstat/sa07 -s 23:30:00 -e 23:55:00 -P ALL
04 $ sar -f /var/log/sysstat/sa07 -s 23:30:00 -e 23:55:00 -P 1,2

However, filtering by specific days is often not enough, so you can filter for a time interval by passing

-s [<hh>:<mm>:<ss>] -e [<hh>:<mm>:<ss>]

for the start and end times. The command in line 2 of Listing 1 returns metrics between 23:30:00 hours and 23:55:00 hours on January 8.

The command returns three data records in addition to the obligatory header line. My systemd timer is set to a 10 minute period, which explains the times of the samples. Other measured variables are user, nice, system, iowait, steal, and idle. Unsurprisingly, user shows the proportion of the CPU load assigned to processes in user space. The same applies to nice, with the restriction that only the processes with their own nice level are included.

The system variable shows utilization by the kernel (i.e., the time required for hardware access or software interrupts). Waits, which can occur when accessing hardware, are listed under iowait, and steal records the share of involuntary waits that occur, for example, if you are looking at a virtual machine and the hypervisor distributes CPU time differently. Finally, idle provides information about time when CPUs are idle and no I/O operations are pending.

Without further details, sar provides CPU utilization, but only as a total for all CPUs. If you want to view individual CPUs or cores, you can run sar with the -P switch. The command in Listing 1 line 3 displays the status of all cores, and the command in line 4 displays the status for cores 1 and 2 only.

Like top, sar displays the average load. In principle, this is an excellent indicator for assessing whether a system is scaled correctly. If the average system load is significantly greater than the number of CPU cores, you are looking at an overload scenario. However, if the value is mostly around zero, the system may be oversized and you could consider withdrawing resources.

As Figure 1 illustrates, sar can output a considerable volume of detail about system load. The run queue (runq-sz ) shows hardly any additional load during the afternoon. The number of processes (plist-sz ) is quite high, at around 350, but this number is intentional, being mainly PHP server processes available as spare servers to process requests as quickly as possible. The machine has four cores and is exposed to very moderate utilization, with an average system load of 0.55.

Figure 1: The average system load is moderate in the afternoon.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus