Fixing Ceph performance problems
First Aid Kit
How Writes Work
To avoid the entire write load having to be handled by the clients in Ceph, the developers decided to use a decentralized approach. Clients implement the CRUSH algorithm [2], which provides random results, but always the same random results with the same layout of the Ceph cluster, which is possible because the client knows the list of all OSDs in the form of the OSDmap that the MONs maintain for them.
If a client wants to store something, Ceph first divides the information into binary objects with CRUSH. In the standard configuration, these are 4MB maximum in size. The client then also uses CRUSH to compute the appropriate primary object storage daemon for each of these objects and then uploads.
The OSDs join the fray as soon as binary objects are received by using CRUSH to calculate the secondary OSDs for the objects and then copying them. Only when the object has arrived in the journals of as many OSDs as the value size
specifies (3
out of the box) is the acknowledgement for the write sent to the client, and only then is the write considered complete.
Layer Cake
Although Ceph is not overly versatile in direct comparison with other cloud solutions, such as software-defined networking, it is still a very good solution. Nevertheless, the write or read performance from a Ceph cluster in this fictional case is still poor. A number of components can be considered possible culprits, and their complexity is reflected in the complexity of monitoring the different levels for speed.
The Basics
Sensible performance monitoring in Ceph ideally starts with the basics of monitoring. Even with Ceph nodes, admins want to keep track of the state of individual hard drives. Although Ceph detects failed hard drives automatically, it is not uncommon for hard disks or SSDs to develop minor defects. Outwardly, they still claim to be working, but write operations take ages, despite not aborting with an error message.
Ceph itself detects these slow writes and displays them (more about this later). It does no harm, however, to discover these tell-tale DriveReady-SeekComplete errors in the central logging system. Today, Ceph regularly uses modern monitoring software such as Prometheus, but it pays to keep up with monitoring of other classic vital signs. Local hardware problems, such as RAM failure or simple overheating, tend to result in a slower response from Ceph.
Buy this article as PDF
(incl. VAT)