Distributed storage with Sheepdog

Data Tender

Come In – The Doors Are Open

Although it's primarily aimed at QEMU, you now have some options for storing normal data in a Sheepdog cluster. If you come from the SAN camp, you might like to check out the iSCSI setup. The trick here lies in the backing-store parameter. If the iSCSI daemon runs on one of the sheep, you can simply use the sheep process's corresponding Unix socket. Otherwise, you have to reference the IP addresses and port of a Sheepdog computer. However, the use of multiple paths has not yet been implemented.

Support for the NFS [12] and HTTP [13] protocols is under development. The former can only handle version 3 and TCP. In the lab, I failed to create a stable cluster. HTTP only serves as a basic framework for the implementation of the Swift interface. With the r option, you can tell the sheep daemon the IP address and port on which the associated web server is listening, and what size the intermediate buffer has to be for the data transfer.

Incidentally, Swift is the last remaining weak point in the OpenStack Sheepdog freestyle exercise [14]. The other two storage components of the open source cloud, Glance [15] and Cinder [16], already cooperate with the flock of sheep.

Also, of course, are QEMU and libvirt [17]. The open source emulator and virtualizer can manage images directly in the Sheepdog cluster and use them as virtual disks. If you use the NBD protocol [18], you can use it with Sheepdog, too. The server part is qemu-nbd-ready. The fact that libvirt understands "the language of the sheep" is actually a logical consequence of QEMU functionality.

Libvirt can store disk images as well as complete storage pools in the flock of sheep (Figure 3). The only thing missing is integration with tools such as virt-manager [19]. Last, but not least, I'll take a look at sheepfs. This is a kind of POSIX layer for the Sheepdog cluster – both for the actual storage objects and for the status information. The associated filesystem driver is not part of the kernel. Not unexpectedly, FUSE technology [20] is used here. In principle, sheepfs is only the representation of the dog management tool in the form of directories and files (Listing 4).

Figure 3: Excerpt from the XML description of a virtual server that has entrusted its hard drive to the sheep.

Listing 4

Using sheepfs

# mount |grep sheepfs
sheepfs on /sheep type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
#
# dog vdi list
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
  eins.img     0  4.0 MB  0.0 MB  0.0 MB 2014-03-01 10:15   36467d       1
#
# cat /sheep/vdi/list
  Name        Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
  eins.img     0  4.0 MB  0.0 MB  0.0 MB 2014-03-01 10:15   36467d       1
#
# dog node list
  Id            Host:Port   V-Nodes       Zone
   0   192.168.1.210:7000       128 3523324096
   1   192.168.1.211:7000       128 3540101312
   2   192.168.1.212:7000       128 3556878528
   3   192.168.1.236:7000         0 3959531712
#
# cat /sheep/node/list
  Id            Host:Port   V-Nodes       Zone
   0   192.168.1.210:7000       128 3523324096
   1   192.168.1.211:7000       128 3540101312
   2   192.168.1.212:7000       128 3556878528
   3   192.168.1.236:7000         0 3959531712
#

What Else?

Sheepdog handles the standard disciplines of virtual storage, such as snapshots and clones, without any fuss. Normally, QEMU manages the appropriate actions, but the objects can also be managed at the sheep level. This is true of creating, displaying, deleting, or rolling back snapshots. Sheepdog admins need to pay attention, however, because the output from dog vdi list changes. The "used" disk space now shows the "difference" between the objects related as snapshots. Cascading filesystem snapshots are also possible (Listing 5).

Listing 5

Snapshots in Sheepdog

# dog vdi list
          Name    Id    Size    Used  Shared    Creation time   VDI id  Copies  Tag
s ntestvm1.img     4  8.0 GB  0.0 MB  2.7 GB 2014-02-05 15:04   982a39       2  feb.snap
s ntestvm1.img     5  8.0 GB  292 MB  2.4 GB 2014-03-01 11:42   982a3a       2  mar.snap
s ntestvm1.img     6  8.0 GB  128 MB  2.6 GB 2014-03-10 19:48   982a3b       2  mar2.snap
  ntestvm1.img     0  8.0 GB  276 MB  2.5 GB 2014-03-10 19:49   982a3c       2
# dog vdi tree
ntestvm1.img---[2014-02-05 15:04]---[2014-03-01 11:42]---[2014-03-10 19:48]---(you are here)
#
# qemu-img snapshot -l sheepdog:192.168.1.236:7000:ntestvm1.img
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
 4        feb.snap                  0 2014-03-01 11:42:30   00:00:00.000
 5        mar.snap                  0 2014-03-10 19:48:38   00:00:00.000
 6        mar2.snap                 0 2014-03-10 19:49:58   00:00:00.000
#

Sheepdog uses copy-on-write snapshots and procedures for cloning. Thus, the derived storage objects only consume space for data that has changed. For reasons of data consistency, Sheepdog allows cloning of snapshots, but don't bother looking for encryption and compression. Additionally, it does not look as if this situation will change any time soon. Instead, the developers point to the use of the appropriate formats for the virtual disks [21].

At the End of the Day

Sheepdog is a very dynamic project with some potential. Integration with libvirt, Cinder, and Glance, and the ongoing work in the Swift area clearly shows this. The separation of the cluster part is interesting. A small Corosync setup is also quickly accomplished, but for professional use in the data center, Sheepdog still needs to become more mature. Topics such as geo-replication, encryption, or fire area concepts play an important role – Ceph and GlusterFS are already much more advanced.

Partial integration into OpenStack is a positive aspect point for the project, but Swift integration must come quickly if Sheepdog does not want to lose touch here. The tool is definitely worth testing in your own lab (see the "Start – With Prudence" box). This is even more true if Corosync or Zookeeper are already in use. If your existing solution for distributed storage leaves nothing to be desired, however, you have nothing to gain by trying out Sheepdog.

Start – With Prudence

For your first steps with Sheepdog, you will definitely want to grab the current version from the Git repository. This version is often significantly more recent than the one provided by your Linux distributor. On the wiki page of the project [22], you'll find useful instructions for setting up the clusterware and the Sheepdog cluster. However, here is also where the dark side of the project rears its ugly head: Quite a lot of the documents on the Internet still refer to obsolete commands or command lines. The central tool, for example, is now dog and not collie. Some features described here are disabled in the default configuration. Additionally, it's worth studying the input parameters for the configure call. The online help could be more detailed. Sometimes your only way out is through trial and error or even reading the source code.

The Author

Udo Seidel teaches math and physics and has been a fan of Linux since 1996. After completing his PhD, he worked as a Linux/Unix trainer, system administrator, and senior solution engineer. Today, he is the Linux strategy team lead with Amadeus Data Processing GmbH in Erding, Germany.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus