« Previous 1 2
A new approach to more attractive histograms in Prometheus
Off the Chart
Switchover
To use native histograms in Prometheus, you need to enable them with the
--enable-feature=native-histograms
flag. As a result, data collection on the target is done in the Protobuf format. You will want to use the latest available version of Prometheus, because this feature is considered experimental and is still subject to ongoing change. In Go, the client_golang v0.14 library supports native histograms, for which you need to modify the code as shown in Listing 2.
Listing 2
Mod for Native Histograms
requestDuration: prometheus.NewHistogramVec( prometheus.HistogramOpts{ Name: "prometheus_http_request_duration_seconds", Help: "Histogram of latencies for HTTP requests.", - Buckets: []float64{.1, .2, .4, 1, 3, 8, 20, 60, 120}, + NativeHistogramBucketFactor: 1.1, + NativeHistogramMaxBucketNumber: 150, }, []string{"handler"}, )
In slightly simplified terms, native histograms use exponentially staggered buckets to cover the entire float64
range of values. The width of the ranges increases by a constant factor that can be adjusted to balance overhead and accuracy. A factor of 1.1 (each successive range is 10% wider than the previous one) provides a good compromise and results in eight ranges per power of two (e.g., between 1 and 2, 2 and 4, 4 and 8, etc.).
You can specify the maximum number of buckets with the
NativeHistogramMaxBucketNumber
option. Without this setting, the number of areas could grow uncontrollably and cause high memory useage. Limiting this option to a defined number allows the native histogram to recalculate its ranges and aggregate some of them, if needed, to reduce memory useage.
Results
Figure 3 shows a comparison for computing the 95th percentile of a specific query for identical data. At each point on the x axis, the corresponding y value represents the threshold below which 95 percent of the requests fell at that point. On the left you see the computations with a legacy histogram, and on the right with the native variant.
The comparison shows the native histograms provide a far more accurate estimate of the percentile value. In contrast, legacy histograms can misleadingly suggest the values are identical before and after a peak. In fact, the percentile value in the example before the peak is about 350ms, and about 500ms after the peak. This difference of 150ms is not represented by legacy histograms.
Conclusions
The Prometheus data model has experienced a major change with the addition of native histograms [3]. This new feature offers a more accurate representation of the data and allows for superior computation of percentiles compared with legacy histograms. The move to Protobuf to collect metrics is a significant upgrade.
Infos
- Native histograms in Prometheus: https://docs.google.com/document/d/1cLNv3aufPZb3fNfaJgdaRBZsInZKKIHo9E6HinJVbpM/edit
- Native histograms in Prometheus (keynote video): https://promcon.io/2022-munich/talks/native-histograms-in-prometheus
- PromQL for native histograms (keynote video): https://promcon.io/2022-munich/talks/promql-for-native-histograms
« Previous 1 2
Buy this article as PDF
(incl. VAT)