New Monitoring Tools

If you like ASCII-based monitoring tools, take a look at three new tools – Zenith, Bpytop, and Bottom.

A long time ago, I was a system administrator for a couple of HPC systems, but I also inherited two HP (Hewlett-Packard) N-class servers (mainframes). Along with two WORM storage units, these were the main servers for the core engineering group. They were busy systems, and I had to work hard to keep up with administration, backups, and software patches.

When a CPU failed in one of the servers, and HP discovered that the previous admin had configured them in high availability (HA) mode, even though they were not licensed for HA, I had to “uncouple” the servers. During the process, I lost access to them. For a new admin, this was a bit unnerving, but with the help of some more experienced admins, the servers soon were back up.

During this time, I only had terminal access (ASCII) to the servers, so I used ASCII monitoring tools to help debug the problems. The combination of the stress of getting the servers back in a usable state as quickly as possible and the invaluable help from the ASCII tools indelibly put ASCII monitoring tools on my list of go-to tools and tricks.

I admit I am a sucker for new monitoring tools. Although I’m comfortable with the ones I use, I do enjoy seeing new ones and trying them out. Because of the history I just disclosed, this is particularly true for ASCII-based tools. Recently, I have learned about some new tools that look interesting.

Zenith

Zenith, a new monitoring tool to me, monitors and presents ASCII charts of an extensive list of metrics. You can even zoom in on these charts, which is something I have not seen before, and manipulate processes, including changing the priority or sending signals to them. An example of the available metrics includes:

  • CPU, memory, network, and disk usage charts
  • Top users of CPU, memory, and disk
  • A process filter table (kind of like Top)
  • Nvidia GPU utilization metrics
  • Summary of free disk space, NIC IP address, and CPU frequency
  • Great battery information for laptops

By default, Zenith comes with a multisection interface (Figure 1).

Figure 1: Zenith interface on startup.

At the very top is a quick summary line showing the system name, the kernel version, length of time the system has been up, and length of time it has been taking data, including the time interval for which data exists.

Below the top line, Figure 1 shows five sections for my system. From top to bottom is the chart of CPU performance, the network performance chart, disk/filesystem performance, a graphics chart (for my Nvidia GPU), and the task interface at the bottom. To skip between the sections, just press the Tab key. The current section is outlined in red (the task interface in this example).

Within each chart you also get text information about the status above and below the chart. For example, in the CPU chart at the top, you can see CPU time information, as you would see in Top, as well as the top user and application (in this case, user laytonjb and the zenith application). You also see the memory usage, swap usage, and number one user of memory (user laytonjb and the WebExtensions application).

To zoom in to a section, just press the e key to “expand” (Figure 2). If you want to reduce the zoom, use the m key to “minimize” (Figure 3). Think of it as zooming in and out in your web browser. If you like, you can zoom in on all the charts with the plus (+) key. Likewise, you can zoom out all with the dash (-) key.

Figure 2: Expanded interface of the Zenith network chart.

Figure 3: Minimized interface of the Zenith network chart.

The charting capability is particularly impressive. Expanding (zoom in) or minimizing (zoom out) is very simply. One of my favorite features is scrolling back in time (Figure 4).

Figure 4: Going back in time with Zenith.

You can tell the chart has gone back in time because at the very top it says (-00:05:34), or 5 minutes, 34 seconds in the past. The advantage of scrolling back in time is the ability to save system performance data. By default, Zenith records data at two-second intervals.

I have been looking for an ASCII-based monitoring tool that is capable of monitoring CPU and the Nvidia GPU for some time. I had thoughtBashtopmight do that, but the author wanted something generic for all GPUs, which doesn’t exist. Now Zenith has delivered. In the previous four figures you can see the GPU performance chart (fourth from the top) with a summary of the statistics just below.

Bpytop

Previously, I wrote a quick introduction to Bashtop. This particularly good monitoring tool has some great capabilities, including real-time ASCII plots. Remarkably, the tool is written entirely in Bash scripting.

At the time of that article, the author of the tool thought about switching to Python to make coding easier and to add capability. Bpytop, the Python port of Bashtop, is the result (Figure 5).

Figure 5: Bpytop initial screen.

When I took the screenshot you see, I was running an OpenMP application that used all the cores – six real cores and six hyperthreading (HT) cores. The CPU chart at the top is at peak for all the cores. You can also see the CPU load at top right with the load for each core.

Overall, the interface is remarkably close to that of Bashtop. One exception is the storage information on the left-hand side, just below the CPU chart. Bpytophas both memory usage and usage per disk.

To highlight the top process in the process window at bottom right, press the Enter key; you will see a screen like that shown in Figure 6. Notice that a small window opens just above the process table listing CPU usage, memory usage, the user, and more about that process.

Figure 6: Bpytop process information.

To toggle the various sections on and off, press 1. The first press turns off the CPU “box” (the information), and pressing it again makes the CPU box reappear. A 2 keypress turns memory box on and off, a 3 keypress turns the network box on and off, and a 4 keypress turns the process box on and off.

Moreover, you can use the keys in combination. For example, from the starting interface, you can press the 1-2-3 keys simultaneously and end up with only the process box. If you press the 2-3 key combination, you end up with the CPU box and the process box (Figure 7). If you turn off the disk box, leaving only the CPU, network, and process boxes, the interface looks like Figure 8.

Figure 7: Bpytop CPU and process boxes.

Figure 8: Bpytop CPU, network, and process boxes.

Bottom

The graphical process and system monitor Bottom is in the same vein as Zenith and Bpytop and is inspired by gtop, and gotop, which are derived from vtop. Bottom can create great ASCII plots of CPU usage; memory usage, including RAM and swap usage; and network usage for send and receive data. The tool can also provide information about disk capacity, I/O operations per second (IOPS), and sensor information – primarily temperatures.

A good process management section gives you some insight into the resources used by each process. Of course, it can be used to kill processes, if needed. An additional feature that can be overlooked is that Bottom is cross-platform: It can run on Linux, macOS, and Windows. Not many of the previous generation of Top-like tools can do this.

After installation, just run Bottom with the btm command. I have lm_sensors installed on my test system, and I had run some code just before issuing the command, so when it started, Bottom looked like Figure 9.

Figure 9: Bottom initial screen.

The screen is broken into “widgets.” The top left widget is a CPU usage chart. To the right of that is a table listing the CPU usage. The widgets in the next row down, left to right, are a chart of memory usage, then a table of sensor values, with a table of mounted filesystems below that. The bottom row of widgets is, left to right, a network usage chart and a process table. To move between widgets, click on a window or use the Ctrl+<arrow key> combination. The selected widget will have a blue border.

In Figure 10, I ran OpenMP code that used all of the processors, which you can see in the CPU usage table. Notice that CPU listings are all at 100%. The red curve, according to the CPU table, is the AVG (average) of all of the cores.

Figure 10: Screenshot of Bottom when running OpenMP code and using all the cores.

Details of the command-line arguments when invoking Bottom are on the GitHub page. Bottomalso has a very comprehensive set of options for managing processes, including searchers, which also can be found on the GitHub page.

I have not explored Bottom too much. However, I really do like its ASCII plot capability. You can zoom into, or expand, the widget by pressing e when a widget is selected. Figure 11 shows an example of expanding the CPU usage chart.

Figure 12: Screenshot of Bottom with expanded view of CPU usage when running OpenMP code.

Summary

I have always found ASCII-based monitoring tools to be of extreme value, even when not administering a huge cluster. I am always on the lookout for new ASCII tools. In this article, I looked at three new tools, all of which are worthy of inclusion in your list of go-to tools when the chips are down.