Software-defined storage with LizardFS
Designer Store
Most experienced IT professionals equate storage with large appliances that combine software and hardware by a single vendor. In the long term, such boxes can be inflexible, because they only scale to a predetermined scope and are difficult to migrate to solutions by other vendors. Enter software-defined storage (SDS), which abstracts the storage function from the hardware.
Distributed SDS filesystems turn quite different traditional servers with hard disks or solid-state drives (SSDs) into a pool, wherein each system provides free storage and assumes different tasks – depending on the solution. Scaling is easy right from the start: You just add more equipment. Flexibility is a given because of independence from the hardware vendor and the ability to respond quickly to growing demand. Modern storage software has fail-safes that ensure data remains available across server boundaries.
Massive Differences
The open source sector claims solutions such as Lustre, GlusterFS, Ceph, and MooseFS. They are not functionally identical; for example, Ceph focuses on object storage. Much sought after is a feature in which SDS provides a POSIX-compatible filesystem; from the perspective of the client, the distributed filesystem [1] acts much like an ordinary local disk.
Some of the storage software solutions are controlled by companies such as Xyratex (Lustre), and Red Hat (GlusterFS and Ceph). Other solutions depend on a few developers and at times see virtually (or definitively) no maintenance. At the MooseFS project [2] (Core Technology), for example, hardly any activity was seen in the summer of 2013, and the system, launched in mid-2008, looked like a one-man project without a long-term strategy and active community. However, it was precisely MooseFS and its abandonment that prompted a handful of developers to initiate a fork and continue the development under the GLPLv3 license. LizardFS [3] was born.
The developers see LizardFS as a distributed, scalable filesystem with enterprise features such as fault tolerance and high availability. About 10 main developers work on the software independently under the umbrella of Warsaw-based Skytechnology [4]. If you want to install it, you can build LizardFS from the sources or go to the download page [5] to pick up packages for Debian, Ubuntu, CentOS, and Red Hat. Various distributions officially introduced the software in recent months, as well.
Components and Architecture
The LizardFS design provides separation of the metadata (e.g., file names, locations, and checksums) from the data, which avoids the emergence of inconsistencies and supports atomic actions at the filesystem level. The system locks all operations through a master, which stores all the metadata and is the central contact for server components and clients.
If the master fails, a second master can assume a shadow role. To set this up, you install a master on an additional server that remains passive. The shadow master permanently picks up all the changes to the metadata and thus reflects the state of the filesystem in its own RAM. If the master fails, the second system with the shadow master assumes the active role and provides all participants with information (Figure 1).
Automatic failover between the primary master and theoretically unlimited number of secondary masters is missing in the open source version of LizardFS. Administrators are either forced to switch over manually or build their own failover mechanism on a Pacemaker cluster. Because switching the master role comprises only changing a configuration variable and reloading the master daemon, administrators with experience in the operation of clusters will develop their own solutions quickly.
Chunk Servers for Storage
The chunk servers are responsible for managing, saving, and replicating the data. All of the chunk servers are interconnected and combine their local filesystems to form a pool. LizardFS divides data into stripes of a certain size, but they remain files from the perspective of the clients.
Like all other components, you can install chunk servers on any Linux system. Ideally they should have fast storage media (e.g., serial-attached SCSI hard drives or SSDs) and export a part of their filesystem to the storage pool. In a minimal case, a chunk server runs on a virtual machine and shares a filesystem (e.g., a 20GB ext4).
The metadata backup logger always collects the metadata changes (much like a shadow master) and should naturally run on its own system. Unlike a typical master, it does not keep the metadata in memory, but locally on the filesystem. In the unlikely event of a total failure of all LizardFS masters, a backup is thus available for disaster recovery.
None of the components pose any strict requirements on the underlying system. Only the master server should have slightly more memory, depending on the number of files to manage.
Buy this article as PDF
(incl. VAT)