Sustainable Kubernetes with Project Kepler
Power Play
Many parts of the world have just survived the hottest summer on record – not unscathed, in many cases. It's generally accepted that human consumption of fossil fuels is to blame for the drastic and seemingly accelerating effects of climate change. Typical estimates suggest that data centers are responsible for 1.4 percent of global electricity consumption, which, although only a small fraction, still represents hundreds of terawatt hours and, potentially, 90 million tons of CO2 released into the atmosphere [1].
Although data center efficiency and sustainability have advanced in leaps and bounds over the past 10 years, with the advent of hyperscale data centers – meaning that compute capacity has vastly increased while power consumption has remained relatively constant – that's still 90 million tons of CO2 that the planet would much prefer to have still locked up deep under its surface.
Personal Consumption
In this article I show that sustainability is not just a concern for designers of hyperscale data centers: Your own everyday activities of writing and running software have a direct and measurable carbon footprint, so everyone needs to consider the effect of their projects on global energy consumption and the long-term health of our delicate planet.
Even those of us without an electronics background instinctively know that a server's carbon footprint isn't "fixed" whenever it's powered on: Everyone's heard their PC fan get louder when the processor starts working hard. Every CPU instruction or memory I/O operation consumes energy and has a carbon footprint, just like every passenger-mile flown on an aircraft. In this article, I'll look at two easy-to-deploy ways of measuring the carbon footprint of a Kubernetes pod, with the aim of allowing you to practice sustainable usage of computing resources and deliver environmental accountability when designing Kubernetes-based solutions.
Even in its own right, Kubernetes lends itself to efficient management of resources and energy. One of the first things taught in any Kubernetes course is how to specify the CPU and memory "requests" and "limits" of a pod. The Kubernetes scheduler, knowing the available CPU and memory resources of each node in the cluster, is able to pack each node with pods in a Tetris-like fashion, making full use of each node's capacity while allowing each pod to run with the guaranteed availability of its requested resources, and any administrator can easily see their cluster's capacity and utilization in the Kubernetes Dashboard. That's in marked contrast to traditional scenarios of running monolothic applications on servers.
In many such situations, "capacity planning" simply means specifying a server that's comfortably more powerful than required by the application, resulting in unnecessary power overhead. Of course, the capacity planning advantages of Kubernetes are offset by the additional overhead and hosts required to run the Kubernetes control plane itself, but as the scale of the cluster increases, the efficiency benefits are compounded to the point that Kubernetes makes it easier to squeeze more computing power out of a given collection of hosts than if those same hosts were each running some monolithic application.
Given the precept that Kubernetes orchestration is a good starting point for a sustainable IT solution, I'll dig deeper to find out how to determine the carbon footprint of an individual workload (pod) running on a Kubernetes cluster, which is the stated aim of Project Kepler, a Cloud Native Computing Foundation (CNCF) Sandbox project that exports Prometheus-compatible power consumption metrics that can be used to analyze the energy cost and carbon footprint of individual pods and nodes and optimize the scheduling of workloads accordingly [2].
In this investigation, I use a WiFi-enabled smart switch to measure the effect of a single repeatable workload on the mains (wall socket) power consumption of various host types and investigate how Kepler attempts to do the same by correlating host-level power consumption information with per-process performance counter stats pulled from the kernel by eBPF with a trained machine learning (ML) model. (See the "What is eBPF?" box.)
What is eBPF?
Extended Berkeley packet filter (eBPF) technology allows programs to be run in kernel space without the need to recompile the kernel or compile and load a kernel module. Kepler's eBPF program uses kernel probes to read CPU instruction counters and discover the start and end times for the CPU usage periods of each process.
Prometheus
Before I try different ways of measuring workload power consumption, I'll focus on an essential part of any Kubernetes monitoring solution, and that is the form of its metrics. Every reader with Kubernetes experience will know that Prometheus has emerged as the de facto monitoring and instrumentation standard for Kubernetes. Prometheus is a tool for recording, storing, and querying time series metrics. Created by SoundCloud in 2012 and adopted by the CNCF in 2016 (as their second project after Kubernetes itself), it can be configured to scrape and store any compatible metric you choose to make available to the tool.
Prometheus metrics are multidimensional: They can be tagged with an arbitrary number of key-value pairs known as labels , making it quick and easy to create new metrics and visualize them however you want. Some applications natively generate Prometheus metrics by means of libraries available in many common languages; other data sources are externally instrumented by an intermediary known as an exporter . A common exporter supplied by the Prometheus operator for Kubernetes is the node exporter , which, as the name suggests, reports node-level data such as the total amount of CPU usage in seconds. Linux processes and Kubernetes pods don't natively provide details of their own power consumption in handy Prometheus format – that's why exporters will feature prominently in this article.
The Prometheus application, running as a Kubernetes StatefulSet, determines from which URLs to read metrics through a combination of static scrape configurations supplied by the administrator, Kuberbetes ServiceMonitor and PodMonitor objects, and autodiscovery of endpoints. The scraped metrics are stored in its database, which you can then query with the Prometheus query language (PromQL). A common tool for running queries and displaying dashboards is Grafana. Figure 1 shows how all of these components work together to provide the complete monitoring – or observability – stack.
Prometheus Metrics
When the Prometheus database or any other consumer of Prometheus metrics (e.g., Sysdig or OpenTDSB, a distributed, scalable time series database) scrapes an endpoint, it is essentially accessing an HTTP payload of the form:
# HELP esphome_power_sensor Reading from esphome power sensor # TYPE esphome_power_sensor gauge esphome_power_sensor{sensorid="sensor-powerplug13", someotherlabel="whatever",....} 18.44799
Tying these three lines together is the metric name itself (esphome_power_sensor
in this example). The human-readable description of the metric's purpose is described in the HELP
line, and TYPE
sets one of four core Prometheus metric types: counter
, gauge
, histogram
, or summary
. The aim is to measure and store instantaneous power meter readings, so this example deals with gauge
-type metrics, or else query the rates of change of counter
metrics.
The example shows the power meter output (18.44799W) from one of the smart switches discussed later. Prometheus reads each gauge at a set time interval, determined by the relevant scrape configuration, and stores all the values in its local storage, providing the time series data that allows you to determine total energy costs. It makes sense for the scrape interval to be equal to or shorter than the frequency with which the metrics themselves are updated by the sensor or software that generates them so that no readings are missed.
If your Kubernetes cluster doesn't yet have Prometheus and Grafana available, follow the instructions online [3] to install the Prometheus Operator on your cluster and check that you can access both the Prometheus and Grafana web user interfaces (UIs) by means of kubectl
port forwarding to ports 9090 and 3000, respectively. Port forwarding with kubectl
is a simple way of setting up network connectivity from your PC to ClusterIP services running on Kubernetes, but it does require you to have kubectl
installed on your PC and configured to use the context of your target cluster.
Now that you have a working infrastructure for storing and dashboarding time series metrics, consider how to generate pod-level power measurements and configure Prometheus to scrape them. This article's main focus – Project Kepler – is essentially an exporter of pod-level power and carbon footprint metrics, and it's entirely reliant on (a) hardware-level power information from the host itself and (b) process-level information gleaned from the kernel by eBPF.
Modern servers and processors come with plenty of tools for measuring and controlling power consumption (e.g., Advanced Configuration and Power Interface (ACPI) and RAPL (see the "What is RAPL?" box) interface), but the CPU is asking you to place a lot of trust in these tools if you really want to know for sure how much mains power is getting consumed as a result of running a given workload. Therefore, to put Kepler's metrics in context, a simple experimental control measures how much power is consumed over the lifetime of a single pod running a consistently reproducible workload; then, you install Kepler and look at its power consumption data for the same pod running the same workload and compare it to the control. This method gives you a way to assess the strengths and limitations of Project Kepler. The methodology for the control is:
What is RAPL?
The running average power limit (RAPL) is a power management and measurement interface that was first introduced in Intel's Sandy Bridge architecture. Its primary aim is to allow a desired average power consumption level to be specified in software; the processor then optimizes its frequency to try and achieve this power level. Kepler is able to use RAPL to read the power consumption of the CPU (split down into various subsytems). In cases where Kepler is unable to retrieve power info from RAPL, such as in a virtualized environment, it will use a "total power estimator" instead. I tried this on a Digital Ocean droplet, but it did not generate any metrics.
1. Take a single-node Kubernetes cluster in an "idle" state (i.e., not running any workloads over and above the basic control plane and Prometheus stack, exporters, etc.).
2. Instrument the node to measure its entire power draw frequently and accurately from the mains by means of a "smart switch."
3. Run the test workload and examine the metrics to see the node's increase in power consumption over and above the steady state.
This process will show the "true" amount of power consumed by the test workload and thus its carbon footprint. Obviously, this method is only useful when you have strict control over what's running on the cluster; it would be no good in a production environment that needs to run multiple users' workloads simultaneously.
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.