Visualizing data captured by nmon
In Good Time
An excellent article by Jeff Layton [1] on nmon monitoring showed nmon to be a most useful performance assessment and evaluation tool. My experience and use of nmon focuses on Layton's statement that "Nmon can also capture a great deal of information from the system and produce CSV files for postprocessing. However, the results are typically not easy to postprocess"; hence, you need a tool to visualize the data.
This is particularly true for big data firms that deal with thousands of Linux server systems and very large amounts of captured information. The visualized information needs to get into reports as quickly as possible, because the time to answer is paramount to be proactive. The bottom line can be based on the vast amount of nmon data; to get a comprehensive picture quickly of all server systems, you have to be able to drill down through the data to analyze individual servers' performance behavior on an ad hoc basis. To accomplish that objective, I've been using a tool at my company called onTune nmon Analyzer Plus (ONA Plus) from TeemStone [2].
ONA Plus: It's Fast
To use the tool, you first copy the nmon logs to the tool's directory. After the copy process, the application starts to load the files automatically. It takes about 20 minutes to process 10GB of nmon data (on a contemporary Windows laptop). In most instances, when dealing with a lot of nmon logs from thousands of servers, you can get a cup of java and come back to find the processing done.
The tool pops up the window in Figure 1 right after the last logfile disappears in the tool's directory. To begin, you choose the start and end dates for the period of interest and select OK . The application processes the logs, executes a viewer program, and displays graphs and views.
It's Easy, Visual, Interactive
ONA Plus starts with the display shown in Figure 2, a Summary View of all server systems. The servers are grouped according to average/maximum values of CPU, memory, paging space utilization, and server count per grouping. You also can group the servers to reflect physical or logical separation of data centers and regions.
The server list on the left, and shown expanded in Figure 3, lists the main performance criteria for each server, as well as system information (e.g., CPU count, clock speed, memory, IP address). These nice features make it easier and more convenient for you to get a quick overview of your systems while focusing on the real performance analysis task at hand. I consider the list view to be most helpful and informative, because all the main performance information for each server is viewable at a glance.
Band ratio data, shown in the top left pane of Figure 2 and in the list window below, is a supplemental indicator used in conjunction with the average and maximum values to determine the load on a server. In my studies, I did not really have a need for the band ratio, so I usually just turned it off.
Drilling Down
To drill down through an individual server to analyze performance behavior, you can use the Direct View and the Chartlist Detailed view options. The Direct View option (Figure 4) provides detailed trending charts for each individual server's basic performance parameters (for a selected time range). To enhance the analysis process, a base chart is printed at the bottom of the application screen that depicts the entire time period and allows you to choose a smaller time epoch. Ergo, it is really easy to zoom in further and select a smaller sample period for analysis (Figure 5). Furthermore, you can choose the Predicting Trendline option to generate a simple forecasting graph.
In most scenarios, a more comprehensive forecasting technique is required; nevertheless, an interesting feature provided with the tool can be used to assess, for example, CPU fluctuation via a sine wave. The view I have used extensively in projects is the Chartlist Detailed option.
Here, you can display the process ID or command in a split screen directly below the basic performance items (Figure 6), so with the synchronized timeline, you can visually ferret out which process is the culprit underlying or has the potential for a performance bottleneck or anomaly. By providing all of these drill-down features in one analysis ecosystem, the tool can be used to conduct some serious server performance and tuning studies.
The Filesystem view (Figure 7) conveniently discloses all the filesystems of all the servers and their utilization (%) in one list, as well as detailed filesystem utilization charts on a per-server basis.
Buy this article as PDF
(incl. VAT)