Getting started with the Apache Cassandra database

Believable

Cluster Playground

As a last test, if everything is working as desired, you can use Portainer to open a Bash shell on one of the node computers and enter:

nodetool status

Nodetool is a comprehensive tool for managing, monitoring, and repairing the Cassandra cluster. The output of the status subcommand should look like Figure 5. All three nodes must appear, the status should be Up U and Normal N , and each node should have an equal share of the data.

Figure 5: This output from the nodetool status command indicates that everything is OK.

Finally, you can start playing around with the cluster. From the docker host, log into the DC1N1 node created in Listing 7 by specifying its IP address and the port reserved for cqlsh in the configuration – by default this is port 9042:

Listing 7

docker-compose.yml

01 version: '3'
02 services:
03     # Configuration for the seed node DC1N1
04     # The name could stand for datacenter 1, node 1
05 DC1N1:
06         image: cassandra:3.10
07         command: bash -c 'if [ -z "$$(ls -A /var/lib/cassandra/)" ] ; then sleep 0; fi && /docker-entrypoint.sh cassandra -f'
08         # Network for communication between nodes
09         networks:
10             - dc1ring
11         # Mapped the volume to a local directory.
12         volumes:
13             - ./n1data:/var/lib/cassandra
14         # Environment variable for the Cassandra configuration.
15         # CASSANDRA_CLUSTER_NAME must be identical on all nodes.
16         environment:
17             - CASSANDRA_CLUSTER_NAME=Test Cluster
18             - CASSANDRA_SEEDS=DC1N1
19         # Expose ports for cluster communication
20         expose:
21             # Intra-node communication
22             - 7000
23             # TLS intra-node communication
24             - 7001
25             # JMX
26             - 7199
27             # CQL
28             - 9042
29             # Thrift service
30             - 9160
31             # recommended Cassandra Ulimit settings
32         ulimits:
33             memlock: -1
34             nproc: 32768
35             nofile: 100000
36
37     DC1N2:
38         image: cassandra:3.10
39         command: bash -c 'if [ -z "$$(ls -A /var/lib/cassandra/)" ] ; then sleep 60; fi && /docker-entrypoint.sh cassandra -f'
40         networks:
41             - dc1ring
42         volumes:
43             - ./n2data:/var/lib/cassandra
44         environment:
45             - CASSANDRA_CLUSTER_NAME=Test Cluster
46             - CASSANDRA_SEEDS=DC1N1
47         depends_on:
48               - DC1N1
49         expose:
50             - 7000
51             - 7001
52             - 7199
53             - 9042
54             - 9160
55         ulimits:
56             memlock: -1
57             nproc: 32768
58             nofile: 100000
59
60     DC1N3:
61         image: cassandra:3.10
62         command: bash -c 'if [ -z "$$(ls -A /var/lib/cassandra/)" ] ; then sleep 120; fi && /docker-entrypoint.sh cassandra -f'
63         networks:
64             - dc1ring
65         volumes:
66             - ./n3data:/var/lib/cassandra
67         environment:
68             - CASSANDRA_CLUSTER_NAME=Test Cluster
69             - CASSANDRA_SEEDS=DC1N1
70         depends_on:
71               - DC1N1
72         expose:
73             - 7000
74             - 7001
75             - 7199
76             - 9042
77             - 9160
78         ulimits:
79             memlock: -1
80             nproc: 32768
81             nofile: 100000
82
83     # A web-based GUI for managing containers.
84     portainer:
85         image: portainer/portainer
86         networks:
87             - dc1ring
88         volumes:
89             - /var/run/docker.sock:/var/run/docker.sock
90             - ./portainer-data:/data
91         # Access to the web interface from the host via
92         # http://localhost:10001
93         ports:
94             - "10001:9000"
95 networks:
96     dc1ring:
$ cqlsh -uadmin 172.21.0.2 9042

This command requires a local Cassandra installation on top of the cluster, which is included with the CQL shell. Alternatively, you can use Portainer to open a Linux shell on one of the nodes and then launch cqlsh there. Next, entering

CREATE KEYSPACE pingtest WITH replication = {'class':'SimpleStrategy','replication_factor':3};

creates the keyspace for the ping run-time example on the three-node cluster with three replicas.

High Availability, No Hard Work

Without having to do anything else, the database now stores copies of every row in the pingtest keyspace on every node in the cluster. For experimental purposes, the consistency level can be set to different values, either interactively for each session or by using the query in pingtest.pl.

A value of one (Listing 6, line 17) means, for example, that only one node needs to confirm the read or write operation for the transaction to be confirmed. More nodes are required for settings of two or three, and all the nodes if you select all. A quorum setting means that a majority of the nodes at all the data centers across which the cluster is distributed must confirm the operation, whereas local_quorum means that a configurable minimum number at the data center that hosts the coordinator node for this row is sufficient.

In this way, Cassandra achieves what is known as tunable consistency, which means that users can specify for each individual query what is more important to them. In an Internet of Things application, simple confirmation of a node might be sufficient, the upside being database performance boosts because the database does not have to wait for more nodes to confirm the results. With a financial application, on the other hand, you might prefer to play it safe and have the result of the operation confirmed by a majority of the nodes. In this case, you also have to accept that this will take a few milliseconds longer.

Once you have set up the pingtime table (Listing 6, line 15) in the distributed pingtest keyspace and have enabled the cron job described earlier so that data is received, you should now be able to retrieve the same data from the table on all nodes. If this is the case, then replication is working as ordered.

You could now use Portainer to shut down individual nodes. Depending on the consistency setting, the selects will provoke an error message if you address a surviving node that has the data available but cannot find a sufficient number of coworkers to confirm the result.

If you then reactivate a node some time later, no further steps are required in the ideal case. Cassandra takes care of the resynchronization on its own. The prerequisite is that the hinted_handoff_enabled variable in the central cassandra.yaml configuration file is set to true. Cassandra stores, in the form of hints, the write operations the node missed because of its temporary failure and automatically retrofits them as soon as the node becomes available once again. For more complicated cases, such as the node having been down for so long that you run out of space for hints, nodetool has a repair command.

If you browse the Cassandra documentation [5], you will find many interesting commands and procedures that you can try on a small test cluster to get a feel for the database that will certainly pay dividends when you go live.

Conclusions

Cassandra is a powerful distributed NoSQL database especially suited for environments that need to handle large volumes of data and grow rapidly. Several advanced features, such as built-in failover, automatic replication, and self-healing after node failure, make it interesting for a wide range of application scenarios. However, migration from a traditional relational database management system (RDBMS) to Cassandra is unlikely to be possible without some overhead because of the need to redesign the data structure and queries.

Infos

  1. Lakshman, A., and P. Malik. Cassandra – A decentralized structured storage system. ACM SIGOPS Operating Systems Review, 2010; 44(2):35-40, http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
  2. Apache Cassandra: http://cassandra.apache.org
  3. DataStax Constellation: https://constellation.datastax.com
  4. Cassandra image: https://hub.docker.com/_/cassandra
  5. Cassandra documentation: https://docs.datastax.com/en/cassandra/3.0/index.html

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus