« Previous 1 2 3
It’s the Little Things
vmstat
One of those *nix commands that gets no respect is vmstat . However, it can be an extremely useful command, particularly for HPC. Vmstat reports Linux system virtual memory statistics. Although it has several “modes,” I find the default mode to be extremely useful. Listing 2 is a quick snapshot of a Linux laptop.
Listing 2: vmstat on a Laptop
[laytonjb@laytonjb-Lenovo-G50-45 ~]$ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 5279852 2256 668972 0 0 1724 25 965 1042 17 9 71 2 0 1 0 0 5269008 2256 669004 0 0 0 0 2667 1679 28 3 69 0 0 1 0 0 5260976 2256 669004 0 0 0 504 1916 933 25 1 74 0 0 2 0 0 5266288 2256 668980 0 0 0 36 4523 2941 29 4 67 0 0 0 0 0 5276056 2256 668960 0 0 4 4 9104 6262 36 5 58 0 0
Each line of output corresponds to a system snapshot at a particular time (Table 1), and you can control the amount of time between snapshots. The first line of numbers are the metrics since the system was rebooted. Lines of output after that correspond to their current values. A number of system metrics are very important. The first thing is to look at is the number of processes (r and b ). If these numbers start moving up, something unusual might be happening on the node, such as processes waiting for run time or sleeping.
Table 1: vmstat Output
vmstat Column | Meaning |
procs | |
b | No. of processes waiting for run time |
r | No. of processes in uninterruptible sleep |
memory | |
swpd | Amount of virtual memory used |
free | Amount of idle memory |
buff | Amount of memory used as buffers |
cache | Amount of memory used as cache |
swap | |
si | Amount of memory swapped in from disk (blocks/sec) |
so | Amount of memory swapped out to disk (blocks/sec) |
io | |
bi | No. of blocks received from a block device (blocks/sec) |
bo | No. of blocks sent to a block device (blocks/sec) |
system | |
in | No. of interrupts per second, including the clock |
cs | No. of context switches per second |
cpu | |
us | Time spent running non-kernel code (=user time + nice time) |
sy | Time spent running kernel code (=system time) |
id | Time spent idle |
wa | Time spent waiting for I/O |
st | Time stolen from a virtual machine |
The metrics listed as m emory can be useful, particularly as the kernel grabs and releases memory. You shouldn't be too worried about these values unless the values in the next section (swap ), are non-zero. If you see non-zero si and so values, excluding the first row, you should be concerned, because it indicates that the system is swapping, and swapping memory to disk and can really kill performance. If a user is complaining about performance and you see a node running really slowly with a very large load, then it’s a good possibility that the node is swapping.
The metrics listed in the io section are also good to watch. They list either blocks sent to a block device or blocks received from a block device. If these numbers are both large, the application running on the nodes is likely doing something unusual by reading and writing to the device at the same time. This situation too can hurt performance.
The other metrics can be very useful, but I tend to focus on those mentioned first before scanning the others. You can also send this data to a file for later postprocessing or plotting – for example, for debugging user problems on nodes.
« Previous 1 2 3