Software-defined storage with LizardFS
Designer Store
Quotas, Garbage, Snapshots
When many systems or users access LizardFS volumes, you can enable quotas and restrict the use of disk space. You can also enable recycle bins for LizardFS shares for users who are used to having them in Samba and Windows shares. Files moved into the recycle bin remain on the chunk servers until they exceed the configured retention time. Administrators can mount shares with certain parameters, thus providing access to the virtual recycle bins. Unfortunately, you have no way to provide the users themselves access to previously deleted data.
Snapshots offer another approach to storing files. A command to duplicate a file to a snapshot is particularly efficient because the master server copies only the metadata. Only when the content of the original starts to differ from the snapshot does the chunk server modify the appropriate blocks.
Thanks to the replication goals and topologies, you can create files so that copies are archived on a desired chunk server or a group chunk server. LizardFS can natively address tape drives, so you can equip a group chunk server with linear tape open (LTO) storage media, thus ensuring that your storage system always keeps certain data on tape and that clients can read these if necessary.
Compared
LizardFS competes with SDS, such as Ceph and GlusterFS. Ceph is primarily an object store – comparable to the Amazon S3 API – that can also provide block devices, and the CephFS POSIX-compliant filesystem is more of an overlay of the object store than a robust filesystem. CephFS was classified as ready for production at the end of April 2016, so long-term results are not available for a direct comparison, which in any case would be of little real value.
On paper, GlusterFS offers almost the same functionality as LizardFS, but it has been on the market since 2005 and thus enjoys an appropriate standing in the SDS community. Red Hat's distributed storage system offers many modes of operation that produce different levels of reliability and performance depending on the configuration.
GlusterFS offers configuration options at the volume level, whereas LizardFS defines the replication targets at the folder or file level. Both variants have advantages and disadvantages. In the case of GlusterFS, you need to opt for a variant when creating the volume, whereas you can change the replication modes at any time with LizardFS.
Security-conscious system administrators appreciate the ability of GlusterFS to encrypt a volume with a key. Only servers that have the correct key are then entitled to mount and decrypt the volume. Both GlusterFS and LizardFS are implemented on Linux clients as Filesystem in Userspace (FUSE) modules.
LizardFS only writes data put in its care to a chunk server; from there, the chunk servers replicate the data between themselves. With GlusterFS, however, the client handles the replication: The write operation occurs in parallel on all involved GlusterFS servers, so the client needs to make sure the replication was successful everywhere, resulting in poorer performance for write access, although of little consequence otherwise.
Whereas LizardFS always presents a master to the client, GlusterFS clients can specify multiple servers when mounting a volume. If the first server in the list fails, the client independently accesses one of the other nodes, which is a clear advantage over LizardFS, because the admin cannot guarantee a failure-safe environment without a proprietary component or possibly a hand-crafted Pacemaker setup.
Conclusions
LizardFS is not positioned as a competitor to the Ceph object store, but as an alternative to GlusterFS, and it is already running as a production SDS solution in setups with several hundred terabytes. LizardFS has many features that are likely to match the needs of many potential SDS users (see the "Evaluation and Experience" box). Additionally, it can keep data in sync between multiple data centers. More excitement is in store if the developers stay true to their announcement and implement an S3-compatible API.
Evaluation and Experience
I have focused on SDS technologies for quite a while now and thus encountered LizardFS for the first time in the fall of 2015. Since then, I have operated a LizardFS 2.6 pilot environment on Debian 7, with another LizardFS 3.10 Debian 8 system in production operation. I have put both clusters through numerous tests that show the behavior of the SDS solutions in daily work and failure scenarios.
The tests described in the article cover only a portion of the possible events, but they do show that the setups basically work and are capable of dealing with error cases, even though the failover case is sometimes slightly sluggish. One of the test scenarios involved rebooting the master to provoke a change of master role to another node in a different data center. Concurrent read access with a client was interrupted for two seconds during the master switchover, but then continued. In such cases, it seems so important that the accessing application can cope with such delays and that the master switchover happens as quickly as possible.
Another scenario envisages that clients are familiar with chunk servers at the local data center and prefer these chunk servers to chunk servers from another site for read access. While a client in the local data center was reading, we disconnected the chunk servers at the site from the web; in response, the client sourced the data from chunk servers at the other site without a noticeable delay. As soon as the chunk servers we disconnected from the mains at the data center came back online, the client automatically went back and read the data from the original nodes.
However, testing also revealed a limitation: The admin sets goals in LizardFS as to how often the distributed filesystem should store a file. You would normally configure at least as many, if not more, chunk servers to meet the goal. If tests, reboots, and crashes or other failures mean that you do not have enough chunk servers online to meet the goal for a new file immediately, the master denies the write operation. In the tested scenario, only two chunk servers were available despite a replication goal of three. A LizardFS developer confirmed this behavior and promised to initiate an internal discussion.
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.