Log analysis in high-performance computing
State of the Cluster
Examples of Log Analysis Stacks and Tools
All of the log analysis tools and stacks are different, flexible, and run in a specific manner. Some are created from various components, usually open source, and others are monolithic tools, so it would be too difficult to examine them all. Rather, I'm going to focus on a log analysis stack concept that illustrates the tools that fulfill the various tasks described at the beginning of the article. In this way, I hope to orient you in the correct direction for deploying a log analysis capability.
Splunk
Although I'm a big supporter of open source, Splunk [3], a commercial product, is probably the gold standard for a combination log collection and log analysis tool. It is the template security incident and event management (SIEM) tool that all others use for comparison. Honestly, though, Splunk is pretty costly for an HPC system and may be overkill.
Splunk came out in 2003 and was very quickly a big success. Arguably, it was the first enterprise-grade log collection and analysis tool that could monitor logs from many different devices and applications and locate patterns in real time. It also uses machine learning in its analysis, which was very innovative at the time and set the bar for similar tools. Splunk has a great number of features, is easy to install, uses data indices and events, and can ingest data from many sources. For an enterprise-grade tool, all of these features made it unique and powerful.
Sites started using Splunk, and its popularity grew and grew. However, as I mentioned, it's fairly expensive, particularly for HPC systems, which has led to log analysis stacks and tools developed to compete with Splunk but still be open source, less expensive at the very least, or both.
Splunk Alternatives
Given the preeminence of Splunk and the associated cost, people naturally will look for Splunk alternatives [4]. An article about Splunk alternatives presents a few options but is not necessarily comprehensive. From that article, I gleaned a few options:
- LogDNA (commercial with open source agent; software as a service (SaaS))
- Elastic Stack (aka the ELK stack), which includes Elasticsearch (search and analytics engine), Logstash (log collection transformation), Kibana (visualization), and Beats (data shippers, not really part of ELK, but added to Elastic Stack)
- Fluentd (open source)
- Sumo Logic (commercial)
- Loggly (uses open source tools, including Elasticsearch, Apache Lucene, and Apache Kafka)
- Graylog (open and commercial versions)
Most of the open source tool stacks have a company behind them that offers support, proprietary plugins, additional capability, or a combination of features for a price.
Rather than dig into the tools and stacks, I'm going to discuss the ELK stack and its components briefly. This stack was the original open source log analysis stack designed to replace Splunk but has morphed into the Elastic Stack while adding an additional tool, Beats, a collection of data shipper tools.
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.