Lead Image © Kritiya Sumpun, 123RF.com

Lead Image © Kritiya Sumpun, 123RF.com

Fixing Ceph performance problems

First Aid Kit

Article from ADMIN 59/2020
By
Ceph is powerful and efficient, but wrong settings or faulty hardware can cause the decentralized object store to stumble.

Ceph has become the de facto standard for software-defined storage (SDS). Companies building large, scalable environments today are increasingly unlikely to go with classic network-attached storage (NAS) or storage area network (SAN) appliances; rather, distributed object storage, now part of Red Hat, is preferred.

Unlike classic storage solutions, Ceph is designed for scalability and longevity. Because Ceph is easy to use with off-the-shelf hardware, enterprises do not have to worry about only being able to source spare parts directly from the manufacturer. When a hardware warranty is coming to an end, for example, you don't have to replace a Ceph store completely with a new solution. Instead, you remove the affected servers from the system and add new ones without disrupting ongoing operations.

The other side of the coin is that the central role Ceph plays makes performance problems particularly critical. Ceph is extremely complex: If the object store runs slowly, you need to consider many components. In the best case, only one component is responsible for bad performance. If you are less lucky, performance problems arise from the interaction of several components in the cluster, making it correspondingly difficult to debug.

After a short refresher on Ceph basics, I offer useful tips for everyday monitoring of Ceph in the data center, especially in terms of performance. In addition to preventive topics, I also deal with the question of how admins can handle persistent Ceph performance problems with on-board resources.

The Setup

Over weeks and months, a new Ceph cluster is designed and implemented in line with all of the current best practices with a 25Gbps fast network over redundant Link Aggregation Control Protocol (LACP) links. A dedicated network with its own Ethernet hardware for traffic between drives in Ceph ensures that the client data traffic and the traffic for Ceph's internal replication do not slow each other down.

Although slow hard drives are installed into the Ceph cluster, the recommendations of the developers have been followed meticulously, and these slow drives have been provided with a kind of SSD cache. As soon as the data is written to these caches, a write is considered complete for the client, making it look client-side as if users are writing to an SSD-only cluster.

Initially, the cluster delivers the desired performance, but suddenly packets are just crawling across the wire, and users' patience is running out. The time Ceph takes to get things done seems endless, and nobody really knows where the problem might lie. Two questions arise: How do you discover the root cause of the slow down? How do you ideally and continuously monitor your cluster so that you can identify potential problems before cluster users even notice them? The answers to both questions require a basic understanding of how Ceph works.

RADOS and Backends

Red Hat markets a collection of tools under the Ceph product name that creates a complex storage solution when used together. The core of Ceph is the distributed object store RADOS, which is known as an object store because it handles every incoming snippet of information as a binary file.

Ceph achieves its core feature of distributed storage by splitting up these files and putting them back together again later. When the user uploads a file to RADOS, the client breaks it down into several 4MB objects before uploading, which RADOS then distributes across all its hard drives.

Ceph is logically divided into RADOS on the one hand and its frontends on the other (Figure 1). Clients have a cornucopia of options to shovel data into a Ceph cluster, but they all work only if the RADOS object store is working, which depends on several factors.

Figure 1: Clients access Ceph's internal object store, RADOS, either as a block device, as an object, or as a filesystem [1]. CC BY-SA 4.0

Of OSDs and MONs

Two components are necessary for the absolutely basic functionality of RADOS. The first component is the object storage daemon (OSD) which acts as a data silo in a Ceph installation. Each OSD instance links itself to a device, making it available within the cluster over the RADOS protocol. In principle, hard drives or SSDs can be used. However, OSDs keep a journal similar to filesystems – the cache mentioned earlier – and Ceph basically offers the option of outsourcing this cache to fast SSDs, without having to equip the entire system with SSDs.

The OSDs are accompanied by the second component, monitoring servers (MONs), which monitor basic object storage functions. In Ceph, like any distributed storage solution, someone has to make sure that only those parts of the cluster that have a quorum (i.e., the majority of the MONs) are used. If a cluster breaks up because of network problems, diverging writes could otherwise take place in parallel on the now multiple parts of the cluster, and a split-brain would result. Avoiding this is one of the most important tasks of any storage solution.

Additionally, the MONs record which MONs and which OSDs exist (MONmap and OSDmap). The client needs both pieces of information to store binary objects in RADOS.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus