What to Do with System Data: Think Like a Vegan

Sysadmin needs

Sometimes we need to think a little more like a sysadmin vegan. A good way to start is by focusing on the questions that interest the people who are funding the system or who have spoken up in support of it.

High Level Stats

The high level questions that typically get asked are around utilization of the system and if it has had an impact on the users in accomplishing their work. Useful information examples are:

  • Time history of system utilization over the year (just a simple number that indicates how much the system was used)
  • Histogram of job backlog
  • Histogram of job length (how long did the jobs run?)
  • Histogram of the number of nodes or cores requested per job
  • Histogram of storage utilization (storage usage as a function of time)

You can probably think of other metrics that apply to your situation, but don't forget that people higher in the management chain don't want to see gory details and explanation of the data - give them the highlights but have have backup information at the ready.

There are times when you need to present the gory details. Perhaps you need to analyze the system utilization, or present some details to the sysadmin team, or management wants to jump into the details so the get a better understanding.

Related content

  • System Logging for Data-Based Answers

    To be a good HPC system administrator for today’s environment, you need to be a lumberjack.

     

  • System logging for data-based answers
    To be a good HPC system administrator for today's environment, you need to be a lumberjack.
  • Compiler Directives for Parallel Processing

    With directive coding, you annotate code with compiler directives to take advantage of parallelism or accelerators. The two primary standards are OpenACC and OpenMP.

  • Monitoring Storage with iostat

    One tool you can use to monitor the performance of storage devices is iostat . In this article, we talk a bit about iostat, introduce a Python script that takes iostat data and creates an HTML report with charts, and look at a simple example of using iostat to examine storage device behavior while running IOzone.

  • User File Recovery

    Let users recover a deleted file without admin intervention by aliasing the rm command with mv or by writing your own script that moves the data to another location.

comments powered by Disqus