Fixing Ceph performance problems

First Aid Kit

Look Closely

Whether the metric data is displayed by the Ceph Dashboard or by Prometheus and Grafana, the central question remains as to which data you need to track. Four values play a central role in performance:

  • ceph.commit_latency_ms specifies the time Ceph needs on average for all operations. Operations include writing to a drive or initiating a write operation to a secondary OSD.
  • ceph.apply_latency_ms indicates the average time it takes for data to reach the OSDs.
  • ceph.read_bytes_sec and ceph.write_bytes_sec provide an initial overview of what is going on in the cluster.

The same values can also be read out for individual pools. Pools are logical areas within RADOS. However, don't expect too much from these values if the internal Ceph cluster meets the usual standards and all pools point to the same drives in the background. If you have different pools for different storage drives in the cluster (e.g., HDDs and SSDs), the dataset per pool is probably more useful.

In Practice

With the described approaches, problems in Ceph can be detected reliably, but this is far from the end of the story. Many parameters that the Ceph manager delivers to the outside world are generated from the metric values of the individual OSDs in the system. However, they can also be read out directly from the outside with the admin socket of each individual OSD. If you want, you can configure Prometheus or Grafana to pass every operation on every OSD in the installation to the monitoring system. The sheer volume, however, can have devastating effects – a jumble of thousands of values quickly becomes confusing.

It can also be difficult to identify the causes of certain metric values in Ceph, but that's the other half of the Ceph story: Knowing that the cluster is experiencing slow writes in the cluster is all well and good, but what you really need is information about the cause of the problem. The following example from my personal experience can give you an impression of how monitoring details can be used to detect a performance problem in Ceph. However, this process cannot be meaningfully automated.

The Problem

The initial setup is a Ceph cluster with about 2.5PB gross capacity that is mainly used for OpenStack. When you start a virtual machine that uses an RBD image stored in Ceph as a hard drive for the root filesystem, an annoying effect regularly occurs: The virtual machines (VMs) experience an I/O stall for several minutes and are unusable during this period. Afterward, the situation normalizes for some time, but the problem returns regularly. During the periods when the problem does not exist, writes to the volumes of the VMs deliver a passable 1.5GBps.

The monitoring systems I used did indeed display the slow writes described above, but I could not identify a pattern. Instead, the writes were distributed across all OSDs in the system. It quickly became clear that the problem was not specific to OpenStack, because a local RBD image without access through the cloud solution led to the same issues.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus