Log analysis in high-performance computing

State of the Cluster

Classification and Ordering

You can mix all of your logs into a single file or database, but this can make analysis difficult because the logs contain different types of information. For example, a log entry from a user logging onto a system has information that is different from a network storage device logging data throughput over the last five seconds.

You can classify the log messages into different classes, add tags to messages with keywords, or use both techniques for better log organization.

Pattern Recognition

Pattern recognition tools are typically used to filter incoming messages (log entries) according to a set of known patterns. This method allows common events to be separated out or handled differently from events that do not fit the pattern.

Although it might sound like you wouldn't have to collect and store so much log information, (1) you have to define the benign patterns, and (2) you can lose quite a bit of information (more on that later in the article).

Correlation Analysis

Correlation analysis is a way to find all of the entries associated with an event, even if they are in different logs or have different tags. This method can be more important than you might think: If a user's application doesn't run as well today as it did yesterday, you have to determine whether any new events occurred before the latest run. More specifically: What happens when a user's application crashes? Do any events in the logs across various devices explain what could have caused this problem?

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus