« Previous 1 2
Distributed storage with Sheepdog
Data Tender
Come In – The Doors Are Open
Although it's primarily aimed at QEMU, you now have some options for storing normal data in a Sheepdog cluster. If you come from the SAN camp, you might like to check out the iSCSI setup. The trick here lies in the backing-store
parameter. If the iSCSI daemon runs on one of the sheep, you can simply use the sheep
process's corresponding Unix socket. Otherwise, you have to reference the IP addresses and port of a Sheepdog computer. However, the use of multiple paths has not yet been implemented.
Support for the NFS [12] and HTTP [13] protocols is under development. The former can only handle version 3 and TCP. In the lab, I failed to create a stable cluster. HTTP only serves as a basic framework for the implementation of the Swift interface. With the r
option, you can tell the sheep
daemon the IP address and port on which the associated web server is listening, and what size the intermediate buffer has to be for the data transfer.
Incidentally, Swift is the last remaining weak point in the OpenStack Sheepdog freestyle exercise [14]. The other two storage components of the open source cloud, Glance [15] and Cinder [16], already cooperate with the flock of sheep.
Also, of course, are QEMU and libvirt [17]. The open source emulator and virtualizer can manage images directly in the Sheepdog cluster and use them as virtual disks. If you use the NBD protocol [18], you can use it with Sheepdog, too. The server part is qemu-nbd
-ready. The fact that libvirt understands "the language of the sheep" is actually a logical consequence of QEMU functionality.
Libvirt can store disk images as well as complete storage pools in the flock of sheep (Figure 3). The only thing missing is integration with tools such as virt-manager
[19]. Last, but not least, I'll take a look at sheepfs
. This is a kind of POSIX layer for the Sheepdog cluster – both for the actual storage objects and for the status information. The associated filesystem driver is not part of the kernel. Not unexpectedly, FUSE technology [20] is used here. In principle, sheepfs
is only the representation of the dog
management tool in the form of directories and files (Listing 4).
Listing 4
Using sheepfs
# mount |grep sheepfs sheepfs on /sheep type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other) # # dog vdi list Name Id Size Used Shared Creation time VDI id Copies Tag eins.img 0 4.0 MB 0.0 MB 0.0 MB 2014-03-01 10:15 36467d 1 # # cat /sheep/vdi/list Name Id Size Used Shared Creation time VDI id Copies Tag eins.img 0 4.0 MB 0.0 MB 0.0 MB 2014-03-01 10:15 36467d 1 # # dog node list Id Host:Port V-Nodes Zone 0 192.168.1.210:7000 128 3523324096 1 192.168.1.211:7000 128 3540101312 2 192.168.1.212:7000 128 3556878528 3 192.168.1.236:7000 0 3959531712 # # cat /sheep/node/list Id Host:Port V-Nodes Zone 0 192.168.1.210:7000 128 3523324096 1 192.168.1.211:7000 128 3540101312 2 192.168.1.212:7000 128 3556878528 3 192.168.1.236:7000 0 3959531712 #
What Else?
Sheepdog handles the standard disciplines of virtual storage, such as snapshots and clones, without any fuss. Normally, QEMU manages the appropriate actions, but the objects can also be managed at the sheep level. This is true of creating, displaying, deleting, or rolling back snapshots. Sheepdog admins need to pay attention, however, because the output from dog vdi list
changes. The "used" disk space now shows the "difference" between the objects related as snapshots. Cascading filesystem snapshots are also possible (Listing 5).
Listing 5
Snapshots in Sheepdog
# dog vdi list Name Id Size Used Shared Creation time VDI id Copies Tag s ntestvm1.img 4 8.0 GB 0.0 MB 2.7 GB 2014-02-05 15:04 982a39 2 feb.snap s ntestvm1.img 5 8.0 GB 292 MB 2.4 GB 2014-03-01 11:42 982a3a 2 mar.snap s ntestvm1.img 6 8.0 GB 128 MB 2.6 GB 2014-03-10 19:48 982a3b 2 mar2.snap ntestvm1.img 0 8.0 GB 276 MB 2.5 GB 2014-03-10 19:49 982a3c 2 # dog vdi tree ntestvm1.img---[2014-02-05 15:04]---[2014-03-01 11:42]---[2014-03-10 19:48]---(you are here) # # qemu-img snapshot -l sheepdog:192.168.1.236:7000:ntestvm1.img Snapshot list: ID TAG VM SIZE DATE VM CLOCK 4 feb.snap 0 2014-03-01 11:42:30 00:00:00.000 5 mar.snap 0 2014-03-10 19:48:38 00:00:00.000 6 mar2.snap 0 2014-03-10 19:49:58 00:00:00.000 #
Sheepdog uses copy-on-write snapshots and procedures for cloning. Thus, the derived storage objects only consume space for data that has changed. For reasons of data consistency, Sheepdog allows cloning of snapshots, but don't bother looking for encryption and compression. Additionally, it does not look as if this situation will change any time soon. Instead, the developers point to the use of the appropriate formats for the virtual disks [21].
At the End of the Day
Sheepdog is a very dynamic project with some potential. Integration with libvirt, Cinder, and Glance, and the ongoing work in the Swift area clearly shows this. The separation of the cluster part is interesting. A small Corosync setup is also quickly accomplished, but for professional use in the data center, Sheepdog still needs to become more mature. Topics such as geo-replication, encryption, or fire area concepts play an important role – Ceph and GlusterFS are already much more advanced.
Partial integration into OpenStack is a positive aspect point for the project, but Swift integration must come quickly if Sheepdog does not want to lose touch here. The tool is definitely worth testing in your own lab (see the "Start – With Prudence" box). This is even more true if Corosync or Zookeeper are already in use. If your existing solution for distributed storage leaves nothing to be desired, however, you have nothing to gain by trying out Sheepdog.
Start – With Prudence
For your first steps with Sheepdog, you will definitely want to grab the current version from the Git repository. This version is often significantly more recent than the one provided by your Linux distributor. On the wiki page of the project [22], you'll find useful instructions for setting up the clusterware and the Sheepdog cluster. However, here is also where the dark side of the project rears its ugly head: Quite a lot of the documents on the Internet still refer to obsolete commands or command lines. The central tool, for example, is now dog
and not collie
. Some features described here are disabled in the default configuration. Additionally, it's worth studying the input parameters for the configure
call. The online help could be more detailed. Sometimes your only way out is through trial and error or even reading the source code.
Infos
- Sheepdog: http://sheepdog.github.io/sheepdog/
- QEMU: http://www.qemu.org/
- RFC 4391: http://tools.ietf.org/html/rfc4391
- Corosync: http://corosync.github.io/corosync/
- Zookeeper: http://zookeeper.apache.org/
- RFC 6151: http://www.ietf.org/rfc/rfc6151.txt
- RFC 6234: http://www.ietf.org/rfc/rfc6234.txt
- Consistent hashing and random trees: http://dl.acm.org/citation.cfm?id=258660
- FNV hash: http://www.isthe.com/chongo/tech/comp/fnv/index.html
- Change hash function from SHA1 to FNV-1a: http://lists.wpkg.org/pipermail/sheepdog/2009-December/000097.html
- Erasure coding: https://github.com/sheepdog/sheepdog/wiki/Erasure-Code-Support
- RFC 1813: http://www.ietf.org/rfc/rfc1813.txt
- RFC 2616: http://www.ietf.org/rfc/rfc2616.txt
- OpenStack: http://www.openstack.org/
- Glance: http://docs.openstack.org/developer/glance/
- Cinder: http://docs.openstack.org/developer/cinder/
- libvirt: http://libvirt.org/
- Network Block Device: http://nbd.sourceforge.net/
- Virtual machine manager: http://virt-manager.org/
- FUSE: http://fuse.sourceforge.net/
- Appropriate formats: http://github.com/sheepdog/sheepdog/wiki/Which-Format-of-QEMU-Images-Should-I-Run
- Sheepdog wiki: http://github.com/sheepdog/sheepdog/wiki
« Previous 1 2
Buy this article as PDF
(incl. VAT)