Virtuous Benchmarks: Using Benchmarks to Your Advantage

Database of Results

The most common problem I encountered as an admin is user applications that do not run or run poorly. This type of problem can be difficult to tackle because of the myriad possible causes.

One of the first things to do in tackling these problems is test the nodes that seem to be causing the performance problems. To do this, you need to know what kind of performance to expect from the nodes. Don’t forget that you have a very nice database of test results you can use for this testing. Of course, these tests might not “tickle” the node(s) in the same way a user application does, but at least you have a starting point for debugging the node.

In addition to debugging the node itself, the database results can help track down network problems. With the node group tests from the database in hand, you can re-run the small node group tests across the set of nodes you suspect are not performing well and see how the results compare with the database.

Admins also update system software from time to time (e.g., a security update or a new version of a compiler or library). To determine whether the nodes are performing well after the update, you can simply re-run the tests and compare the results to the database. If the results are as good or better than the previous results, you are golden. If the results are worse, maybe you have some triage time ahead. Regardless, you should keep track of the tests after the system has been upgraded as a new baseline for comparison.

After a firmware upgrade on nodes or switches, I would definitely recommend re-running all of the tests – from single-nodes to larger node groups. Be sure to compare these results to the database of results. If the new results are the same or better than the previous results, life is good. Again, don’t forget to store the new results as a new baseline. If the results are worse and triaging does not turn up much, you might have to roll back the firmware version while you debug the updated firmware with the manufacturer(s). Without a database of benchmark results, though, determining whether you need to roll back or not would be difficult.

A great way to use these benchmark results is to re-run the tests on nodes periodically by creating some simple jobs, running them, and recording and comparing the results. A simple tool can parse the benchmark results and throw them into the database for comparison with the old results, and you can even use statistical methods in the comparison. If you start to see performance differences between these periodic runs or between the runs and the database, it might be time to take the node(s) out of production for triage.

I know of one site that, for a period of time, re-ran some simple single-node tests in the scheduler epilogue script. They would pull these results into a great flat file and constantly run statistics against that file. Although this example is a bit extreme, they were having problems at the time, and it illustrates the usefulness of a performance database.

Summary

Benchmarks have been used nefariously by both vendors and customers, but they don’t have to be used for evil purposes; instead, they can be very useful to admins.

For example, debugging a user’s application when it isn’t running well is always difficult; however, you have an advantage if you have a set of baseline performance benchmarks in your back pocket. An excellent way to start checking for problems is to check the nodes they use. In particular, I would briefly take the nodes out of production and check their performance by repeating the exact same tests used to create the database and comparing the results to the database.

I hope I’ve convinced you that benchmarks can be amazingly useful for admins, and even users.

Related content

  • Using benchmarks to your advantage
    A collection of single- and multinode performance benchmarks is an excellent place to start when debugging a user's application that isn't running well.
  • Measuring the performance health of system nodes
    Many HPC systems check the state of a node before running an application, but not very many check that the performance of the node is acceptable before running the job.
  • Performance Health Check

    Many HPC systems check the state of a node b efore  running a n  application, but not very many check that the performance of the node is acceptable before running the job.

  • ClusterHAT

    Inexpensive, small, portable, low-power clusters are fantastic for many HPC applications. One of the coolest small clusters is the ClusterHAT for Raspberry Pi.

  • Favorite benchmarking tools
    We take a look at three benchmarking tool favorites: time, hyperfine, and bench.
comments powered by Disqus