Comparing Ceph and GlusterFS
Shared storage systems GlusterFS and Ceph compared
Tuning for GlusterFS and Ceph
To get the most out of GlusterFS, admins can start at different points. In terms of pure physics, the network throughput and the speed of the disks behind the brick are decisive. At this level, these actions are completely transparent to GlusterFS. It uses eth0
or bond0
. Faster disks on the back end help. Admins also can tune the corresponding filesystem. It helps for GlusterFS to store files entrusted to it 1:1 on the back end. It is not advisable to choose too many bricks per server, but there are also many things to adjust at protocol level.
Switching O_DIRECT
on or off using the POSIX translator has already been mentioned. At the volume level, read cache and write buffers can be adjusted to suit your needs. The eager-lock
switch is relatively new. GlusterFS allows faster transfer of locks from one transaction to the next. In general, the following rules apply: The relative performance grows with the number of clients. Distribution at the application level is also advantageous for GlusterFS performance. Single-threaded applications should be avoided. Ceph provides administrators with some interesting options with regard to a storage cluster's hardware.
The previously described process of storing data on OSDs happens in the first step between the client and a single OSD, which accepts the binary objects from the client. The trick is that, on the client side, more than one connection to an OSD at the same time does not represent a problem. The client can therefore split a 16MB file into four objects of 4MB each and then simultaneously upload these four objects to different OSDs. A client in Ceph can thus continuously write to several spindles simultaneously (this bundles the performance of all the disks used, as in RAID0).
The effects are dramatic: Instead of expensive SAS disks, as used in SAN storage, Ceph provides comparable performance values with normal SATA drives, which are much better value for the money. The latency may give some admins cause for concern, because the latency of SATA disks (especially the desktop models) lags significantly behind that of similar SAS disks. However, Ceph developers also have a solution for this problem, which relates to the OSD journals.
Each OSD has a journal in Ceph – that is, an upstream region that initially incorporates all changes and then ultimately sends them to the actual data carrier. The journal can either reside directly on the OSD or on an external device (e.g., on an SSD). Up to four OSD journals can be outsourced to a single SSD, with a dramatic effect on performance in turn. Clients simultaneously write to the Ceph cluster in such setups at the speed that several SSDs can offer so that, in terms of performance, such a combination leaves even SAS drives well behind.
Conclusion
No real winner or loser is seen here. Both solutions have their own strengths and weaknesses – fortunately, never in the same areas. Ceph is deeply rooted in the world of the object store and can therefore play its role particularly well in that area as storage for hypervisors or open source cloud solutions. It looks slightly less impressive on the filesystem area. This, however, is where GlusterFS enters the game. Coming from the file-based NAS environment, it can leverage its strengths – even in a production environment. GlusterFS only turned into an object store quite late in its career; thus, it still has to work like crazy to catch up.
In the high-availability environment, both tools feel comfortable – Ceph is less traditionally oriented than GlusterFS. The latter works with consumer hardware, but it feels a bit more comfortable on enterprise servers.
The "distribution layer" is quite different. The crown jewel of Ceph is RADOS and its corresponding interfaces. GlusterFS, however, impresses thanks to its much leaner filesystem layer that enables debugging and recovery from the back end. Additionally, the translators provide a good foundation for extensions. IT decision makers should look at the advantages and disadvantages of these solutions and compare them with the requirements and conditions of their data center. What fits best will then be the right solution.
Infos
- Ceph: http://www.ceph.com
- GlusterFS: http://www.gluster.org
- GPL 3.0: http://www.gnu.org/copyleft/gpl.html
- LGPL: http://www.gnu.org/licenses/lgpl.html
- Contributor License Agreement: http://en.wikipedia.org/wiki/Contributor_License_Agreement
- FUSE: http://fuse.sourceforge.net
- NFSv3: http://tools.ietf.org/html/rfc1813
- NFSv4: http://tools.ietf.org/html/rfc3530
- CTDB: http://ctdb.samba.org
- POSIX: http://www.opengroup.org/austin/papers/posix_faq.html
- libgfapi: http://github.com/gluster/glusterfs/tree/master/api
- Samba: http://www.yaml.org
- OpenStack: http://www.openstack.org
- QEMU: http://wiki.qemu.org/Main_Page
- libgfapi-python: http://github.com/gluster/libgfapi-python/tree/master
- ROT13: http://de.wikipedia.org/wiki/ROT13
- Glupy: http://github.com/jdarcy/glupy
- Swift: http://docs.openstack.org/developer/swift
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.