« Previous 1 2 3 4 Next »
What's new in Ceph
Well Kept
Practical for the Provider
Why has Red Hat taken this step? From the provider's point of view, it offers so many advantages that it would be almost negligent not to do so. The overhead required to maintain Ceph as software alone is massively reduced by the intermediate container layer. In the containerized approach, for a system to run Ceph, it only needs a runtime environment for containers, which is the case on all RHEL versions. SUSE, Ubuntu, and others can also be made fit for containers with Podman or Docker Community Edition (CE).
All you basically need to do is maintain exactly one variant of all supported Ceph versions, instead of many different versions and packages for various distributions. This reason is exactly why Red Hat and ultimately IBM are unlikely to move away from this approach. Red Hat has already classified the deployment scenario to be legacy, and the deprecated and final unsupported states are likely to follow soon. At least you can convert an existing setup to the new format with cephadm
. The Ceph documentation contains instructions.
Changes Under the Hood
With all the hype about the Ceph meta level, the impression arises that Ceph is basically no longer in technical development. Far from it: Both RADOS, as the object store at the heart of the solution and the standard front ends, Ceph Block Device, Ceph Object Gateway, and the Ceph filesystem CephFS, have seen significant technical advances in the recent past, as seen with a closer look at the individual components.
The developers have constantly worked on the on-disk format of Ceph object storage daemons (OSDs) and made smaller and more efficient files found there. As you are probably aware, Ceph has been using its own mini-filesystem named BlueFS on its OSDs for a few releases; the whole enchilada then goes by "BlueStore." Thanks to BlueStore, an OSD in Ceph no longer needs a POSIX-compatible filesystem on its block device. BlueStore uses a RocksDB-based mapping table as an alternative; the table contains the stored objects and their physical addresses on disk for each stored object. Because RocksDB also includes a write-ahead log (WAL), it brings journaling capabilities to scenarios where individual components of the environment fail. OSDs now provide their services on their storage devices far faster than with the legacy XFS-based solution.
Improved Tools
Ceph developers also have significantly improved some tools, especially in terms of command-line output. The ceph
command now displays progress bars for recovery and resync operations, which is useful for a clear overview. The view with all the placement groups (PGs) in the cluster that ceph ph dump
conjures onto your screen is still quite long, but with fewer columns so that it will at least fit on widescreen displays.
PGs are also handled differently. The PG is Ceph's unit for logically bundling objects and for distributing them across its own storage devices (i.e., the OSDs). At the beginning of Ceph development, administrators had to define the total number of Ceph PGs present in the cluster manually. One problem was that several formulas were in circulation on how to calculate the ideal number, and it was initially impossible to reduce or increase the number of existing PGs retroactively while keeping the same number of OSDs.
In the meantime, both capabilities have been added. If you stick to certain lower and upper limits, you can freely define the number of PGs in your cluster. However, the PGs still have an influence on performance. Today, RADOS gives administrators an automatic scaler for PGs on the ceph-mgmt
framework. It has also been enabled by default since Ceph Octopus. Messing around with the total number of placement groups is likely to be a thing of the past for the majority of admins. Only in very rare and unusual setups will it be possible to tease more out of the installation than the autoscaler gives.
« Previous 1 2 3 4 Next »
Buy this article as PDF
(incl. VAT)