« Previous 1 2 3 4 Next »
Management improvements, memory scaling, and EOL for FileStore
Refreshed
How It Works
Among the several options, if you manage your own automation, the ideal approach is to remove existing FileStore OSDs from the cluster and recreate them as BlueStore OSDs. Mixing BlueStore and FileStore in the same cluster is not a problem from Ceph's point of view, and all Ceph deployment tools support the ability to create new OSDs. If you already use the cephadm
deployment tool, which is part of Ceph's own automation environment (Ceph-Mgr), you will find the appropriate commands for replacing OSDs.
Before migrating to BlueStore, first update your cluster to the latest Ceph version (17.2 at print). Afterwards, the individual OSDs can be removed at the command line, using the familiar tools, and added back to the cluster as new OSDs. The usual restrictions apply: In smaller installations, you should take care to remove only a few OSDs from the cluster at a time to avoid massive recovery operations on the back end. Either way, you end up having to copy all the objects back and forth within the cluster several times to replace all the OSDs. If you trigger this operation for too many OSDs at the same time, you run the risk of disturbing ongoing operations.
Neither Ceph nor cephadm
have explicit options that convert existing OSDs from FileStore to BlueStore. One big advantage of Ceph is that existing OSDs can fail without data being lost or performance suffering. From a developer's point of view, it would be nonsensical to create extra code that enables a trivial operation.
More Visibility
One of the most difficult tasks in operating a Ceph cluster from the administrator's point of view is getting a quick overview of the cluster's status. Much has happened in this respect in recent years. One example is Ceph Dashboard, which visualizes the cluster's vital parameters and accesses data from Ceph-Mgr. One other thing is also true: If you really want to come to grips with a Ceph cluster, you cannot escape the command line.
An excellent example pertains to placement groups (PGs). Under the hood, the RADOS object memory is known to be broken down into several layers before you get to the binary objects themselves. Each binary object belongs to a PG, and PGs are logically isolated from each other with the use of pools, primarily for performance reasons: Huge clusters can quickly grow to several million binary objects, so if RADOS itself were to make decisions such as the OSD on which an object is stored – on a per-object basis – it would affect performance and wouldn't work at all for very large object sets. Placement groups prevent this problem, and for some years now, Ceph has also supported dynamic scaling of PGs per cluster. With the command
ceph pg dump
you can display the current status of each PG in the cluster, whether it is located on the OSDs specified by the controlled replication under scalable hashing (CRUSH) algorithm and whether a recovery is currently taking place.
In Ceph 17.2, the tool now has a new output column that provides information about whether a PG is currently subject to Ceph's own mechanism for checking data consistency (aka scrubbing). Scrubbing, and especially the more thorough deep scrubbing, can cause significantly slower than usual access to individual PGs. It was not uncommon in the past to initiate an extensive performance analysis, only to realize a few minutes later that the problems had disappeared. The advanced display of ceph pg dump
helps avoid these problems.
Hyperconvergence Caution
Admins of hyperconverged setups need to pay special attention to the Ceph-Mgr osd_memory_target_autotune
parameter in Ceph 17.2, which is set to 0.7
out of the box. On systems with this setting, Ceph-Mgr configures the OSDs to occupy 70 percent of a system's available RAM.
Of course, the purpose of hyperconverged setups is to allow the operation of virtual instances in addition to the Ceph components. They will not be happy if they have to make do with less than 30 percent of the system's available memory: After all, the running kernel and its services also need to use some RAM.
This example shows once again that Ceph developers are very critical of hyperconverged setups, even though Red Hat now officially supports them and markets them as a cost-effective alternative to Ceph-only clusters. Therefore, anyone running such a setup should be sure to limit OSDs to a total of 20 percent of available RAM with,
ceph config set mgr mgr/cephadm/autotune_memory_target_ratio 0.2
when upgrading to Ceph 17.2.
« Previous 1 2 3 4 Next »
Buy this article as PDF
(incl. VAT)