Lead Image © bowie15, 123RF.com

Lead Image © bowie15, 123RF.com

Collecting application logfiles with Kubernetes

Attentive Co-Driver

Article from ADMIN 66/2021
By
Modern scale-out environments with containers make log collection difficult. We present concepts and methods for collecting application logfiles with a sidecar container in Kubernetes environments.

When a large, monolithic virtual machine (VM) with all services is replaced by a group of several cooperating containers, each running only one service, containerized applications scale the number of respective containers as required to improve availability and performance and allow for the individual components of an application to be changed separately (e.g., to complete an update).

Web applications are the classic example of a modern scale-out architecture. A number of containers handle the database back end, and still others host a classic network filesystem for static content. A scaling group runs the application's web front end. Redundant containers with a message bus or a key value store provide communication for all the components involved. The front-end application developers can dynamically change and update their part of the application without disturbing the functionality of the back end or other components.

However, a scale-out architecture also comes with a whole series of challenges – and not just for application developers. Administrators of a scaling environment also need to stay on top of things: both metrics and logs. In monolithic scenarios, the administrator can simply integrate a log collector client into the application VM and provide it with static configurations. However, this scenario no longer works for dynamically scaling environments, where logs come from a great many containers with ever-changing names and addresses.

Most commercial Kubernetes implementations take care of metrics out of the box because the management layer needs information such as the CPU load and memory usage of individual pods and containers (e.g., to trigger scale-up or scale-down tasks). However, logs are a different story because users have to take care of collecting the application logs themselves.

Old Friends Reach Their Limits

Operating system templates for containers need to stay as small as possible and only include the bare essentials, with no extensive Init system such as systemd and no log services such as journald or syslog. The goal is to start only one service with the container. Of course, you do not want a container to collect log information on the temporary local filesystem. Many start scripts therefore simply call the desired service in "foreground mode," such as

exec httpd -DFOREGROUND

in an Apache container.

In this way, all the log output ends up on the console by way of the well-known stdout and stderr data streams. From there, container managers such as Docker or Podman can retrieve and process the output. These services provide log drivers for this purpose that forward the stderr and stdout outputs of a container to a collector.

In the simplest form, Docker, Podman, and Kubernetes write the container logs in their own file formats, so you can retrieve them with:

docker|podman|kubectl logs <containerid>

The list of available drivers includes syslog, journald, and fluentd, all three of which forward the log output of running containers to the respective server running on the container host itself. Syslog and journald are fine for single installations and developer and test environments. However, if you need a better overview, you can hardly get around qualified and grouped logs and will therefore want to look into fluentd [1].

However, a simple fluentd scenario assumes that the service itself is running on the container host (or in a container on the host) and that the container only outputs a log over stdout, which in turn limits scaling. In a larger Kubernetes environment, IT managers will want to collect logs per associated application rather than per host. Further complicating matters, various applications do not easily dump their log information to stdout/stderr.

In some scenarios, the service in the container generates multiple logs in different formats, which would make evaluation by stdout far more difficult. For example, if a PHP application runs on an Nginx web server in a container, up to four log outputs are generated: the access and error logs of the Nginx server, the log of the PHP interpreter, and the log of the PHP application.

Sidecar for the Log Collector

As an alternative to log collection by stdout with collection by the host, you can provide application containers with a sidecar specifically for logging. This concept is referred to as a "sidecar container," which runs the log collection for a single container or a whole group of containers and runs as an additional container within a pod (Figure 1). As the application grows and more pods start, each application service is given its own sidecar.

Figure 1: Complex Kubernetes applications comprise dozens of pods. Within these pods, sidecar containers can collect the log information.

As a passenger, the sidecar container can intercept and process the stdout output of the container being logged. In this mode, however, the logging container must then include the $HOSTNAME in its standard output so that the sidecar container can qualify the logs and keep multiple sources apart. In practice, however, it is far more common for applications to write their logs to a separate logfile. Again, the sidecar container can retrieve the information by having the setup provide a hardened volume for the application and sidecar. Here, you configure your containerized application to put its log output into different directories on a shared storage medium. The sidecar container also links this medium to the log collector. A corresponding Kubernetes deployment (or excerpts of it) will then look like that in Listing 1. (See the "Pod, Container, Replica Set, and Deployment" box.)

Listing 1

Binding the Sidecar Container

apiVersion: apps/v1
kind: Deployment
    spec:
       containers:
       - name: Application
          image: my_application
          volumeMounts:
             - name: logdir1
             mountPath: /var/log/1
             - name: logdir2
             mountPath: /var/log/2
       - name: logcollector
          image: my-log-collector
          volumeMounts:
             - name: logdir1
             mountPath: /var/log/1
             - name: logdir2
             mountPath: /var/log/2
             - name: config
             mountPath: /etc/service
       volumes:
          - name: logdir1
          emptyDir: {}
          - name: logdir2
          emptyDir: {}
          - name: config
          configMap:
             name: my-service-config

Pod, Container, Replica Set, and Deployment

Often, even IT professionals get the concepts of pods, containers, replica sets, and deployments mixed up, so here's a quick run-through of the terms:

A container runs only one service with the minimal operating system runtime and the application itself, if possible.

A pod comprises at least one container and the metadata important to Kubernetes, such as environments and variables. It can contain several directly related containers. However, the containers within a pod cannot scale independently. In the example here, a pod contains the container with the application and a container with the log shipper.

A replica set defines how Kubernetes scales and operates the pods. The set specifies parameters such as the minimum active pods on separate hosts or the lower and upper limits on running pods. For non-scaling applications, however, the replica set can also specify failover rules, such as running one active and one passive pod.

Finally, deployment is the declarative description of how an application should run. It describes the desired pods and, if necessary, several replica sets (front end, back end, database, etc.) and ensures that enough pods are always running as long as the deployment is active.

Services such as fluentd, the lightweight Fluent Bit, or even Filebeat from Elastic itself can run in the log collector container. You can use existing community images as templates or create your own images with Buildah, which requires a matching service configuration. With Docker or Podman, you would copy a customized configuration file to the container with the COPY: option.

A Kubernetes deployment accomplishes this in a slightly more elegant way, in that you can store the content of the service configuration for this purpose in a ConfigMap. The deployment then binds this API object to the log shipper container with a volume (named config), which allows you to store a whole range of different log shipper configurations for different services in your Kubernetes environment. Listing 2 shows an example of Fluent Bit and Nginx in the application container.

Listing 2

Binding Fluent Bit and Nginx

Lik:apiVersion: v1
kind: ConfigMap
metadata:
    name: my-service-config
data:
    fluent-bit.conf: |
       [INPUT]
        Name tail
        Tag nginx.access
        Parser nginx
        Path /var/log/1/access.log
       [INPUT]
        Name tail
        Tag nginx.error
        Parser nginx
        Path /var/log/2/error.log
       [OUTPUT]
        Name      forward
        Match     *
        Host      my.elasticsearch.host
        Port      24224

The data field of ConfigMap can contain several files if required by the log shipper you use. One of the few problems here is not so much the sidecar concept as the log shipper protocols. In the example given, Fluent Bit forwards the qualified log data to the host on the fictitious address my.elasticsearch.host but uses the non-standard port 24224 to do so, which will work, as long as the Elasticsearch-Fluent Bit-Kibana (EFK) stack is operating outside the Kubernetes cluster and is able to accept data on port 24224.

Inside a Kubernetes cluster, however, this turns out to be a bit more difficult. Here, "routers" forward the inbound traffic to the destination, although the term is not technically appropriate because it is a reverse proxy. The reverse proxy in turn usually only sends ports 80, 443, and possibly 6443 (Kubernetes API) to the cluster but not non-standard ports such as 24224. You need to check your log shipper to see whether its protocol can be forwarded like HTTPS through a reverse proxy. Alternatively, some Kubernetes distributions allow you to bind services to the static addresses of individual Kubernetes nodes, which allows data to be sent to non-standard ports without a router.

More Sidecars for Deleting

When logging to files, the application consumes disk space. Systems tend to fail because full logfiles occupy all free space. As soon as containerized applications use shared storage, you also risk overflow. In the example in Listing 2, the containerized application will happily write log data to a shared volume. The log sidecar processes this data but does not delete the data when the work is done. To prevent disk overflow, you need to integrate another sidecar into the pod that takes care of log rotation. A number of ready-made log rotate images for Kubernetes can be found online that take care of archiving and deleting old log data.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus