SDS configuration and performance
Put to the Test
Frequently, technologies initially used in large data centers end up at some point in time in smaller companies' networks or even (as in the case of virtualization) on ordinary users' desktops. This process can be observed for software-defined storage (SDS), as well.
SDS basically converts hard drives from multiple servers into a large, redundant storage environment. The idea is that storage users have no need to worry about which specific hard drive their data is on. Equally, if individual components crash, users should be confident that the landscape has consistently saved the data so that it is always accessible.
This technology makes little sense in an environment with only one file server, but it is much more useful in large IT environments, where you can implement the scenario professionally with dedicated servers and combinations of SSDs and traditional hard drives. Usually, the components connect with each other and the clients over a 10Gb network.
However, SDS now also provides added value if several servers with idle disk space are waiting in small or medium-sized businesses. In this case, it can be interesting to combine this space using the distributed filesystems in a redundant array.
Candidate Lineup
Linux admins can immediately access several variants of such highly available, distributed filesystems. Well-known examples include GlusterFS [1] and Ceph [2], whereas LizardFS is relatively unknown [3]. In this article, I analyze the three systems and compare the read and write speeds in the test network in a benchmark.
Speed may be key for filesystems, but a number of other features are of interest, too. For example, depending on your use of the filesystem, sometimes sequential writing, sometimes creating new files quickly, or sometimes even random reading of different data is crucial. Benchmark results can provide information about a system's strengths and weaknesses.
The three candidates in the test follow different approaches. Ceph is a distributed object store that can also be used as a filesystem in the form of CephFS [4]. GlusterFS and LizardFS, on the other hand, are designed as filesystems; however, although just two nodes are enough to operate a Gluster setup, LizardFS needs an additional control node, for which it has a web interface (Figure 1) that informs you about the state of the cluster.
First, I take a look at the overhead required to install and configure the filesystems. Second, I analyze data throughput measured with read/write tests.
The Test Setup
The lab setup consisted of a client with Ubuntu Linux 16.04 LTS connected to two storage servers with plenty of hard disks (Table 1). The operating system running on the file servers was CentOS 7, with Ubuntu 16.04 LTS on a fourth admin computer that served CephFS and LizardFS as a monitor or master server.
Table 1
Test Hardware
Server | CPU | RAM | Storage |
---|---|---|---|
File server 1 | Intel Xeon X5667 with 3GHz and 16 cores | 16GB | Disk array T6100S with 10 Hitachi drives (7200rpm) configured as RAID 1 |
File server 2 | Intel Core i3-530 with 2.9GHz and four cores | 16GB | iSCSI array Thecus with two 320GB hard drives configured as RAID 1 |
Admin server | Intel Core 2 E6700 with 2.66GHz and two cores | 2GB | Not specified |
Client | Intel Core 2 E6320 with 1.86GHz and two cores | 2GB | Not specified |
File server 1 had a storage array connected via SCSI, whereas file server 2 was connected to another array via iSCSI. Communication between the servers was with Gigabit Ethernet, while the client and the admin server each only had a 100Mbps interface. This setup did not meet high-performance requirements, although it is not uncommon in smaller companies. The setup also allowed the candidates to test on equal terms. Version 1.97 of Bonnie++ [5] and IOzone 3.429 [6] were the test tools. To begin, I launched each tool on the client and then released them on the respective test candidate's mounted filesystem.
GlusterFS
GlusterFS was the easiest to install among the participants. I only needed to install the software on the Ubuntu client and the two storage servers. For the installation on CentOS, I added the EPEL repository [7] and called yum update
.
Version 3.8.5 of the software then installed with the centos-release-gluster
and glusterfs-server
packages. To start the service, I enabled glusterd
via systemctl
. Once the service was started on both storage servers, I checked for signs of life using:
gluster peer probe <IP/Hostname_of_Peer> peer probe: success.
The response in the second line indicates that everything is going according to plan. An error message would probably have indicated a lack of name resolution.
The tester generates the filesystem on the existing filesystems of the storage servers to activate it in the second step:
gluster volume create lmtest replica 2 transport tcp <Fileserver_1>:/<Mountpoint> <Fileserver_2>:/<Mountpoint> gluster volume start lmtest
To use the created volume, I still need the glusterfs-client
package, which provides the kernel drivers and tools needed to integrate a volume and is then executed with the command:
mount.glusterfs <Fileserver_1>:/lmtest /mnt/glusterfs
The filesystem is then available on the client for performance tests.
Buy this article as PDF
(incl. VAT)