Log analysis in high-performance computing
State of the Cluster
Elastic Stack (ELK)
ELK stands for Elasticsearch, Logstash, and Kibana and was put together to provide a complete log management and analysis stack that is all open source and competitive with Splunk. The steps or tenets for log collection, management, and analysis and the tools that fulfill these steps in the ELK stack are:
- log collection (Logstash)
- log/data conversion/formatter (Logstash)
- log search/analysis (Elasticsearch)
- visualization (Kibana)
A fourth tool, Beats, a collection of lightweight data shippers, was later added to the stack, which was renamed Elastic Stack.
The ELK stack was a big hit because it was totally open source and provided a good portion of the capability found in Splunk. Its popularity grew quickly and even Amazon Web Services (AWS) offered the ELK stack components as managed services. These components can be used together and with other AWS services.
The Elastic company [5] develops the tools in the ELK and Elastic stacks, offers support, and develops commercial plugins.
Gathering the Data
Logstash serves the purpose of gathering and transforming logs from the various servers (classification and ordering) by ingesting the logs or data from the specified sources and normalizing, classifying, and ordering it before sending it to the search engine, which is Elasticsearch in the ELK or in Elastic stack.
Installing and configuring Logstash is covered in an article from Elastic [6]. Each client system runs a small tool named Filebeat that collects data from files on that server and sends it to the log server that is running Logstash. This tool allows you to specify system logs or the output of any scripts you create or any applications.
Filebeat takes the place of log gathering tools such as syslog-ng or rsyslog in general but isn't strictly necessary. If you have a server already logging to a central log, you can install Filebeat on that server and the logs will be transmitted to the Logstash server as JSON (JavaScript Object Notation), which can be the same server, easing the upgrade from just log collection to log analysis.
Searching the Data
Logstash gathers the logs and transforms them into a form the search engine, Elasticsearch [7], can then consume (JSON). Elasticsearch is an open source tool that is based on the Apache Lucene library [8]. You can configure Elasticsearch to gather whatever information you need or want, but the defaults [9] are a good place to start.
Besides being open source, Elasticsearch has some attractive features. One aspect that HPC users will understand and appreciate is that it is distributed. If you have lots of data and want to improve performance, you can shard data across several servers.
Elasticsearch also has near real-time performance, perhaps close to Spunk's performance, that gives you quick insight into problems and perhaps overcomes them as quickly. While doing the searching, it creates an index for the data that is very useful if you want to see all of the data related to an event, look back at the system logs and data, or both.
The core of Elasticsearch is in Java, but it has clients in various languages, including, naturally, Java, but also .NET (C#), PHP, Apache Groovy, Ruby, JavaScript, Go, Perl, Rust, and Python. These choices provide a great deal of flexibility of Elasticsearch.
In addition to developing all the tools in the stack – Logstash, Elasticsearch, Kibana (more on that in the next section), and Beats – Elastic also created the Elastic Cloud service.
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.