HPC resource monitoring for users

Close Companion

Example 2

Example 2 is basically the same as Example 1, but it uses the OpenMP version of the Poisson solver [7]. In this case, all of the cores in the system are used.

Notice in Figure 5 that the application execution time with OpenMP is much shorter when using four cores than when using one core. Figure 6 shows CPU usage versus time for Example 2. Remember, this is an OpenMP application running on all cores, which is reflected by 96%-100% core utilization during the entire run. Figure 7 shows the plot of memory usage and Figure 8 a plot of Ethernet during the application run.

Figure 5: Remora output for Example 2.
Figure 6: Example 2 CPU utilization plot.
Figure 7: Example 2 memory utilization plot.
Figure 8: Example 2 Ethernet utilization plot.

Summary

HPC admins are always looking for better ways to monitor the systems for which they are responsible by understanding how the hardware is operating and seeing how user applications are performing. Many tools and techniques – both hardware and software – are available for monitoring systems. Even though you can find tools and techniques to coordinate monitoring with resource managers (job schedulers), all of these are administrator-oriented tools.

Users have precious few tools to monitor the resources their applications are using. With this "application telemetry" information, users can understand the pattern of their application, whether it seems to be performing correctly or incorrectly, what resources they consumed, and how their application is balanced across several nodes in the system – or even a single node.

Remora from TACC can gather this information for you and create plots to help guide you to a better understanding of your application without affecting its performance. Typically, the system administrator installs Remora, but users can install it in their accounts, as well.

Tuning the Remora installation is possible, particularly around what is monitored. Once installed, you just put the command remora before the command that runs the application, and you start gathering information. A few environment variables adjust how Remora gathers the data, but for the most part, it just silently gathers the data for you.

Remora is a great tool for users who want an idea of their application resource usage. Not pure profiling, Remora is really a combination of profiling and system monitoring. Easy to install and fairly light on resource usage, Remora can be a great help to users.

The Author

Jeff Layton has been in the HPC business for almost 25 years (starting when he was 4 years old). He can be found lounging around at a nearby Frys enjoying the coffee and waiting for sales.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus