Lead Image © Gerard Boissinot, Fotolia.com

Lead Image © Gerard Boissinot, Fotolia.com

System logging for data-based answers

Log Everything

Article from ADMIN 43/2018
By
To be a good HPC system administrator for today's environment, you need to be a lumberjack.

Oh, I'm a lumberjack, and I'm okay,

I sleep all night and I work all day.

– from "Lumberjack" byMonty Python

Can't you just imagine yourself in the wilds of British Columbia swinging your ax, breathing fresh air, sleeping under the stars?!!! I can't either, but Monty Python's "Lumberjack" song has a strong message for admins, particularly HPC admins – Log Everything.

Why log everything? Doesn't that require a great deal of work and storage? The simple answer to these questions is yes. In fact, you might need to start thinking about a small logging cluster in conjunction with the HPC computational cluster. Such a setup will give you answers to questions.

Answering questions is the cornerstone of running HPC systems. These questions include those from users such as, "Why is my application not running?" or "Why is my application running slow?" or "Why did I run out of space?" It also answers system administrator questions such as, "What commands did the user run?" or "What nodes was the user allocated during their run?" or "Is the user storing a bunch of Taylor Swift videos?"

If you haven't read about the principle of Managing Up [1], you should. One of the keys of this dynamic is anticipating questions your manager might ask, such as something seemingly as simple as "How's the cluster running?" or something with a little more meat to it such as "Why isn't Ms. Johnson's application running?" or perhaps the targeted question, "How could you screw up so badly?" Implicit in these questions are questions from your manager's manager, and on up the chain. Managing up means anticipating these questions or situations that might be encountered up the management chain (answering the "Bob's" question about what you actually do). More than likely, management is not

...
Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • System Logging for Data-Based Answers

    To be a good HPC system administrator for today’s environment, you need to be a lumberjack.

     

  • What to Do with System Data: Think Like a Vegan

    What do you do with all of the HPC data you harvested as a lumberjack? You think like a Vegan.

  • Gathering Data on Environment Modules

    Gathering data on various aspects of your HPC system is a key step toward developing information about the system and one of the first steps toward tuning your system for performance and reporting on system use. It can tell how users are using the system and, at a high level, what they are doing. In this article, I present a method for gathering data on how users are using Environment Modules, such as which modules are being used, how often, and so on.

  • Log Management

    One of the more mundane, perhaps boring, but necessary administration tasks is checking system logs – the source of knowledge or intelligence of what is happening in the cluster.

  • Nmon: All-Purpose Admin Tool

    HPC administrators sometimes assume that if all nodes are functioning, the system is fine. However, the most common issue users have is poor or unexpected application performance. In this case, you need a simple tool to help you understand what’s happening on the nodes.

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=