« Previous 1 2 3 4 Next »
Management improvements, memory scaling, and EOL for FileStore
Refreshed
Compression
As is the wont with Ceph, the individual Ceph components come with a cornucopia of small but subtle changes. The OSDs now support compression for inter-OSD communication, taking into account that many Ceph systems today have CPUs that are far too powerful. However, planners are still reluctant to use the latest network technologies, such as 400Gbps connections, for financial reasons. Ceph clusters that are overflowing their network at rush hour can use this feature to give themselves some breathing space at the expense of their CPUs. However, you should not have overly high expectations. Ultimately, a permanently overloaded network can only be brought to reason with faster hardware.
Improved Overview for Logs
Another change in your OSDs for which to be happy – hidden in the changelog – is that the OSDs will log slow requests far more clearly than was previously the case. A slow request is basically any write operation to an OSD that does not complete successfully within a specified time period. Virtually any type of problem has the potential to generate many slow requests, whether disruptions on the network, problems with individual OSDs, or configuration errors.
The crux is that each client that uploads something to RADOS triggers three write operations in the default configuration. The first goes directly to the target OSD, but from there replicas of the respective objects are also sent to the secondary OSDs, which, in the case of more extensive problems, results in an extremely large number of slow requests in a short period of time that the cluster is unable to process. Until now, RADOS simply forwarded corresponding messages to the ceph -w
output, where they no longer made sense after a short while.
The new format now bundles the messages, generating far less text on the terminal. If you need the old format (e.g., because monitoring scripts depend on it), you can reinstate the old format in the configuration.
Drilled Out Automation
Although some might turn their noses up at Ceph-Mgr and its cohort cephadm
(Figure 3), others are happy to use the tool. Both positions are justified. On the one hand, you need to ask why Ceph bothers competing with products like Ansible when alternatives like ceph-ansible
exist and work well. On the other hand, Ceph-Mgr has evolved into far more than a plain vanilla automation tool.
The tool's management services are not only perfectly adapted to Ceph and its needs but now also provide an interface for other programs to retrieve data from Ceph. For example, anyone who wants Prometheus to tap data from Ceph for visualization on modified dashboards (Figure 4) will now also be able to tap into the Manager (Ceph-Mgr), because it provides a native Prometheus interface with data in the appropriate format.
In Ceph 17.2, however, the management component of the solution has only seen minor updates. In addition to querying by way of the Prometheus protocol, for example, a Simple Network Management Protocol (SNMP) interface is now also available for extracting various key figures from Ceph and processing them in other monitoring solutions.
Moreover, the developers have massively expanded the solution's ability to roll out individual Ceph services, such as the Manager component, metadata servers, and instances of the RADOS gateway, in parallel with the same systems. Additionally, the zap
subcommand can automatically zap OSDs when removed from the cluster, ensuring that the existing data is reliably deleted. Finally, cephadm
now supports a server agent mode, which should significantly increase the performance of the solution.
« Previous 1 2 3 4 Next »
Buy this article as PDF
(incl. VAT)