Management improvements, memory scaling, and EOL for FileStore

Refreshed

How It Works

Among the several options, if you manage your own automation, the ideal approach is to remove existing FileStore OSDs from the cluster and recreate them as BlueStore OSDs. Mixing BlueStore and FileStore in the same cluster is not a problem from Ceph's point of view, and all Ceph deployment tools support the ability to create new OSDs. If you already use the cephadm deployment tool, which is part of Ceph's own automation environment (Ceph-Mgr), you will find the appropriate commands for replacing OSDs.

Before migrating to BlueStore, first update your cluster to the latest Ceph version (17.2 at print). Afterwards, the individual OSDs can be removed at the command line, using the familiar tools, and added back to the cluster as new OSDs. The usual restrictions apply: In smaller installations, you should take care to remove only a few OSDs from the cluster at a time to avoid massive recovery operations on the back end. Either way, you end up having to copy all the objects back and forth within the cluster several times to replace all the OSDs. If you trigger this operation for too many OSDs at the same time, you run the risk of disturbing ongoing operations.

Neither Ceph nor cephadm have explicit options that convert existing OSDs from FileStore to BlueStore. One big advantage of Ceph is that existing OSDs can fail without data being lost or performance suffering. From a developer's point of view, it would be nonsensical to create extra code that enables a trivial operation.

More Visibility

One of the most difficult tasks in operating a Ceph cluster from the administrator's point of view is getting a quick overview of the cluster's status. Much has happened in this respect in recent years. One example is Ceph Dashboard, which visualizes the cluster's vital parameters and accesses data from Ceph-Mgr. One other thing is also true: If you really want to come to grips with a Ceph cluster, you cannot escape the command line.

An excellent example pertains to placement groups (PGs). Under the hood, the RADOS object memory is known to be broken down into several layers before you get to the binary objects themselves. Each binary object belongs to a PG, and PGs are logically isolated from each other with the use of pools, primarily for performance reasons: Huge clusters can quickly grow to several million binary objects, so if RADOS itself were to make decisions such as the OSD on which an object is stored – on a per-object basis – it would affect performance and wouldn't work at all for very large object sets. Placement groups prevent this problem, and for some years now, Ceph has also supported dynamic scaling of PGs per cluster. With the command

ceph pg dump

you can display the current status of each PG in the cluster, whether it is located on the OSDs specified by the controlled replication under scalable hashing (CRUSH) algorithm and whether a recovery is currently taking place.

In Ceph 17.2, the tool now has a new output column that provides information about whether a PG is currently subject to Ceph's own mechanism for checking data consistency (aka scrubbing). Scrubbing, and especially the more thorough deep scrubbing, can cause significantly slower than usual access to individual PGs. It was not uncommon in the past to initiate an extensive performance analysis, only to realize a few minutes later that the problems had disappeared. The advanced display of ceph pg dump helps avoid these problems.

Hyperconvergence Caution

Admins of hyperconverged setups need to pay special attention to the Ceph-Mgr osd_memory_target_autotune parameter in Ceph 17.2, which is set to 0.7 out of the box. On systems with this setting, Ceph-Mgr configures the OSDs to occupy 70 percent of a system's available RAM.

Of course, the purpose of hyperconverged setups is to allow the operation of virtual instances in addition to the Ceph components. They will not be happy if they have to make do with less than 30 percent of the system's available memory: After all, the running kernel and its services also need to use some RAM.

This example shows once again that Ceph developers are very critical of hyperconverged setups, even though Red Hat now officially supports them and markets them as a cost-effective alternative to Ceph-only clusters. Therefore, anyone running such a setup should be sure to limit OSDs to a total of 20 percent of available RAM with,

ceph config set mgr mgr/cephadm/autotune_memory_target_ratio 0.2

when upgrading to Ceph 17.2.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • What's new in Ceph
    Ceph and its core component RADOS have recently undergone a number of technical and organizational changes. We take a closer look at the benefits that the move to containers, the new setup, and other feature improvements offer.
  • Ceph object store innovations
    The Ceph object store remains a project in transition: The developers announced a new GUI, a new storage back end, and CephFS stability in the just released Ceph c10.2.x, Jewel.
  • Getting Ready for the New Ceph Object Store

    The Ceph object store remains a project in transition: The developers announced a new GUI, a new storage back end, and CephFS stability in the just released Ceph v10.2.x, Jewel.

  • Manage cluster state with Ceph dashboard
    The Ceph dashboard offers a visual overview of cluster health and handles baseline maintenance tasks; with some manual work, an alerting function can also be added.
  • The RADOS Object Store and Ceph Filesystem

    Scalable storage is a key component in cloud environments. RADOS and Ceph enter the field, promising to support seamlessly scalable storage.

comments powered by Disqus