« Previous 1 2 3 Next »
Kubernetes StatefulSet
Hide and Seek
Implementing Failover
With a StatefulSet
or Kubernetes deployment, you don't have to worry too much about application availability. If a database pod crashes, Kubernetes starts a new one within seconds and connects it to the existing persistent storage. Of course, in a Kubernetes cluster, this also covers the crash of a host, with Kubernetes then launching the new pods on another node. Complicated active-passive constructs with quota and Pacemaker rules are no longer needed.
How you back up the databases themselves from the persistent volume depends on the storage back end and your backup strategy. For example, you can have a sidecar container (second container in the same pod) or a separate pod access the PV in read-only mode and back up the physical files from there. In that case, you would need to stop the database in the database pod itself at the right times to close the database files correctly on disk. Built-in tools work best, such as timed database dumps or a second pod that acts as a replication receiver to mirror all transactions asynchronously from the first database server. You can find online how-tos for MariaDB [1] and PostgreSQL [2].
Replication nodes are then useful for a rudimentary scale-out scenario. Replication means you can launch multiple read-only mirror pods of the SQL database, which are then accessed by application pods that require read-only access to the database, which lifts the load off the main database pod.
Scale-Out Dimensioning
One of the essential functions of a relational database is the relations between the tabulated data (queries with JOIN
), which are queried in real time. For this reason, a SQL server needs to keep the complete inventory of the database on the queried node. Concepts such as scale-out and data sharding, wherein parts of the data are distributed across several nodes and no node has all the information, are difficult to implement here.
A little-known database named NuoDB [3] promises full SQL functionality with scale-out architecture. NuoDB relies on two types of nodes: the storage manager with persistent volumes and the transaction engine. In very simplified terms, the transaction nodes act as an in-memory cache for the storage nodes. If you want to test NuoDB, you will find on the project's website a suitable Helm chart for the community edition that supports a maximum of three nodes. However, it only worked in the test environment with Microshift running on a fairly powerful machine – NuoDB requires 8GB of RAM per node pod. Genuine scale-out is still the domain of NoSQL databases, which rely on completely different approaches and data and query structures.
Cassandra: NoSQL with a Structure
The open source Apache Cassandra project saves data in tables like a SQL server. It relies on its own Cassandra query language (CQL) for queries; as the name suggests, CQL is strongly oriented on SQL. Unlike SQL, however, CQL has no relations and therefore no JOIN
queries that query the data from multiple linked tables.
Therefore, Cassandra can redundantly distribute the tables to multiple nodes. If a single node crashes, the system redistributes the lost datasets from the redundancies. If you add more nodes, the system also automatically rebalances the tables to match the number of nodes and redundancy specifications. However, much like a SQL database, Cassandra requires structured data and data types.
Cassandra is well suited as a SQL replacement for applications that handle data management in the application itself and do not require higher level SQL functions such as JOIN
to do so. If you can handle simple create, read, update, delete (CRUD) functions on your own, you can easily switch your application from SQL to CQL and change the database back end.
A Cassandra cluster can be easily launched with StatefulSet
on Kubernetes [4]. With the right configuration, the nodes will automatically find each other; the replica count determines the size of the cluster. If you want to try Cassandra on a test cluster like the Microshift cluster referred to earlier, you need to have enough free RAM, because Cassandra is written in Java and is a little memory hungry. The setup for this article ran on an old laptop with a Core i7 and 16GB of RAM – and that was fine.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)