Coordinating distributed systems with ZooKeeper

Relaxed in the Zoo

Know Your Nodes

After connecting, the admin sees a prompt that looks like the last line of Listing 1, where ls / shows whether a node already exists. If this is not the case, the admin can now create znodes:

$ create /test HelloWorld!
Created /test

The content (i.e., HelloWorld!) can be discovered using the get /test command (shown in Listing 2), which outputs the metadata.

Listing 2

get /test

[zk: localhost:2181(CONNECTED) 11] get /test
HelloWorld!
cZxid = 0xa1f54
ctime = Sun Jul 20 15:22:57 CEST 2014
mZxid = 0xa1f54
mtime = Sun Jul 20 15:22:57 CEST 2014
pZxid = 0xa1f54
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 11
numChildren = 0

ZooKeeper supports the concept of volatile (ephemeral) and sequential (sequence) znodes – in contrast to a normal distributed filesystem. A volatile znode disappears when the owner's session ends.

Admins usually use this node to discover hosts in the distributed filesystem. Each server then announces its IP address via a volatile node. If the server loses contact with ZooKeeper, or the session ends, the information will disappear along with the node.

ZooKeeper automatically appends a suffix of consecutive numbers to the names of newly generated sequential znodes. A leader election can be easily implemented in ZooKeeper, because the software ensures that new servers publish their information in znodes, which are both sequential and volatile nodes at the same time (see the "Leader Election" box).

Leader Election

In the context of a leader election, distributed systems use a round-robin algorithm to select a process that handles special tasks and, for example, coordinates other processes. If the leader or any other server goes offline, the session ends and the volatile nodes are removed.

Thanks to the sequence number, other servers get to know the new leader quickly; the pattern ZooKeeper uses to create volatile and sequential znodes organizes all the nodes at the same time in a queue that is visible to all. The principle can also be applied to distributed locks of any kind with any number of nodes in them. These locks prevents multiple processes from accessing a resource that is not designated for them at the same time.

To create double nodes like this, the admin uses the -e (ephemeral) and -s (sequence) flags:

$ create -e -s /myEphemeralAndSequentialNode \
  ThisIsASequentialAndEphemeralNode
Created /myEphemeralAndSequentialNode0000000001

The contents of these nodes can be read with get, as demonstrated in Listing 3. Again, additional information is created here.

Listing 3

Volatile and Sequential Nodes

§§nonubmer
[zk: localhost:2181(CONNECTED) 14] get /myEphemeralAndSequentialNode0000000001
ThisIsASequentialAndEphemeralNode
cZxid = 0xa1f55
[...]

Messenger Service

Another key feature of ZooKeeper is that it offers software watchers for znodes; you can use them to set up a notification system. To do so, you register a watcher for all clients that want to track a particular component. The registered clients are thus notified of the monitored component when the znode content changes.

To transform an existing znode into a Watcher, you run the stat command, adding the watch parameter:

$ stat /test watch
cZxid = 0xa1f60
[...]

If you connect with ZooKeeper from a different terminal, you can change the znode:

$ set /test ByeCruelWorld!
cZxid = 0xa1f60
[...]

The watcher set up in the first session now sends a message via the command line:

WATCHER::
WatchedEvent state:SyncConnected type:NodeDataChanged path:/test

A user on the system thus discovers that the content of the znode has changed and can view the new content if they so desire.

However, there is a downside: znodes only fire once. To receive more updates, you need to re-register the znode, and an update could be lost in the meantime. However, you can discover this if you look at the znode version numbers. If each version number counts, sequential nodes are recommended.

Zoo Rules

ZooKeeper guarantees order by making every update part of a fixed sequence. Although not all clients may reside in the same time frame, they still see all updates in the same order. This approach makes it possible to make write access dependent on a particular version of the znode: If two clients attempt to update the same znode in the same version, only one of the updates will prevail because the znode receives a different version number after the first update. This makes it easy to set up distributed counters and make minor updates to node data.

ZooKeeper also offers the possibility of carrying out several update operations in one fell swoop. It applies the system atomically, which means that it performs all of the operations, or none of them.

If you want to feed ZooKeeper with data that needs to work consistently across one or more znodes, one option is to use of a multiupdate API. However, you should consider the fact that the API is not as mature as ACID transactions in traditional SQL databases. You cannot simply type BEGIN TRANSACTION, because you still have to define the expected version numbers of the znodes involved.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • The advantages of configuration management tools
    Etcd, ZooKeeper, Consul, and similar programs are currently the subject of heated debate in the world of configuration management. We investigate the problems they seek to solve and promises they make.
  • Apache Storm
    We take you through the installation of a Storm cluster and discuss how to create your own topologies.
  • Verifying your configuration
    Automated acceptance testing is a powerful tool for catching problems related to misconfiguration. We'll show you how to implement your own acceptance testing environment with a free tool called goss.
  • A watchdog for every modern *ix server
    Monit is a lightweight, performant, and seasoned solution that you can drop into old running servers or bake into new servers for full monitoring and proactive healing.
  • Distributed storage with Sheepdog
    The Sheepdog distributed storage tool exhibits pronounced doggedness and deserves appropriate attention, benefiting projects such as QEMU and OpenStack.
comments powered by Disqus