Photo by Ashley Light on Unsplash

Photo by Ashley Light on Unsplash

Kubernetes StatefulSet

Hide and Seek

Article from ADMIN 73/2023
By
Legacy databases are regarded as stateful applications and, theoretically, not a good fit for containers. We reveal how classic SQL can still work well on Kubernetes and the database options available to SMEs for scale-out environments.

Many enterprises are looking to migrate legacy monolithic applications to scale-out architectures with containers. Unfortunately, this step turns out to be far more complicated than many IT departments would like. A cloud-native architecture requires that various components of the applications communicate with each other on a message bus to allow for distribution and scaling to different nodes. The big challenge for the developer is: Where does the application back up its data files?

Up to now, SQL databases have mainly handled this task. However, very few SQL implementations can run as clusters in a scale-out architecture. Application developers now have a choice: Either run the existing database technology in a container or switch to a scale-out database. However, this choice is only available to large companies with their own software developers. Small and medium-sized enterprises (SMEs), on the other hand, tend to work with off-the-shelf tools and are forced to use one of the databases that their application supports.

Despite the steadily increasing number of NoSQL databases for scale-out architectures, many web applications still only support one or more classic SQL databases. The most popular candidates include MariaDB (MySQL), PostgreSQL, and Microsoft SQL. In this article, I discuss how to run these SQL classics reliably in container environments with Docker, Podman, or Kubernetes and present a few interesting NoSQL approaches.

Trusted Sources

In a genuine scale-out application, the failure of a single container should not affect the application as such. Therefore, early implementations had no function to provide persistent (i.e., non-volatile) mass storage for a single container. Unfortunately, the SQL classics run the database server in a single container: If the container fails, the database is lost, and it doesn't help that the container platform can restart the container in a fraction of a second.

The platform therefore needs to provide the container a reliable mass storage device on which it can store its data and that will survive the demise of a container. This storage is then usually mounted as an overlay on the container's filesystem at a specified location (e.g., /var/lib/mysql), and the container template for the respective database needs to provide the appropriate logic to evaluate correctly the content of the persistent storage.

If a container starts with empty persistent storage (and only then), the container template's housekeeping system has to create the appropriate directory structure. If the container starts up with populated persistent storage, the application reads the data and configuration from the overlay. In the process, the logic also needs to start update or repair processes in case of doubt. If a database container crashes and corrupts the database files in the process, the container needs to check for consistency on restarting.

On the other hand, you could stop a container with database version x and start a container with version x +1. This container then checks the dataset for its version and carries out an update without corrupting any data.

You have to be very careful where you pick up your container templates and just as careful if you build them yourself. Most database server vendors offer an official container template through one of the popular Docker or Kubernetes registries and on GitHub. The documentation specifies exactly what is included in the template and how it handles update and recovery scenarios. If you want to build your own template, check the GitHub repositories of the official builds to see what their entrypoint.sh scripts do before starting the database.

Small Steps to the Container

If you run containers on a single node with Docker or Podman, you will not usually assign separate IP addresses – at least not addresses that would be visible to other machines on the network. Port forwarding routes one or more IP ports of the container to ports of the host system. A MariaDB container can forward its internal port 3306 directly to host port 3306 or any other.

Alternatively, you could provide a bridge adapter to the container environment. The containers can then be managed directly with individual IP addresses like virtual machines (VMs). In such an environment, a "dedicated" MariaDB container could be run as a replacement for a MariaDB VM:

podman run --name maria --volume /var/pods/maria:/var/lib/mysql:Z --net pub_net --ip 192.168.1.10 --mac-address 12:34:56:78:9a:bcdocker.io/mariadb:latest

The host system /var/pods/maria directory contains the database files of the container. If it crashes, the data is retained. The whole setup works without port forwarding. The pub_net defined by the administrator runs on a network bridge. All the systems on the local network can access the MariaDB container by way of its IP address.

The code points to the mariadb:latest image. Each restart of the container can trigger an update of the database – major versions, too. For production environments, you will therefore always want to specify the MariaDB version number and only update after appropriate testing. Image tags let you to specify only the major or minor releases (e.g., mariadb:10 or mariadb:10.8.3). However, a setup like this with a "generic" database server that many clients in the local area network address directly is not used very often. In far more cases, a database server only supports a single application.

The second example shows how to run a MariaDB server on Kubernetes (see the box "Kubernetes Test with Microshift"). The example uses StatefulSet to ensure that an application with a state always has the appropriate persistent storage available (Figures 1 and 2). To begin, create the mariadb-state.yml file (Listing 1) that launches with the service.

Listing 1

<mariadb-state.yml

01 ---
02 apiVersion: v1
03 kind: Service
04 metadata:
05    name: mariadb
06    labels:
07         app: mariadb
08 spec:
09    type: NodePort
10    ports:
11    - port: 3306
12       protocol: TCP
13    selector:
14       app: mariadb
15 ---
16 apiVersion: v1
17 kind: ConfigMap
18 metadata:
19    name: mariadb
20    labels:
21       app: mariadb
22 data:
23    MYSQL_ROOT_PASSWORD: mysqlroot
24    MYSQL_DATABASE: db1
25    MYSQL_USER: mysqluser
26    MYSQL_PASSWORD: mysqlpwd
27 ---
28 apiVersion: apps/v1
29 kind: StatefulSet
30 metadata:
31      name: mariadb
32 spec:
33    serviceName: "mariadb"
34    replicas: 1
35    selector:
36         matchLabels:
37           app: mariadb
38 ---
39 volumeClaimTemplates:
40 - metadata:
41       name: mariadb
42     spec:
43       accessModes: [ "ReadWriteOnce" ]
44       resources:
45        requests:
46           storage: 10Gi

Kubernetes Test with Microshift

The simple Microshift, a variety of OpenShift-Kubernetes that targets the niche between lean Linux edge devices and OpenShift-Kubernetes edge clusters, is useful as a Kubernetes environment for testing and development. To proceed, simply set up a Red Hat Enterprise Linux 8 (RHEL 8), CentOS Stream 8, or Fedora 35 VM with a minimal setup of two to four virtual (v)CPUs and 4 to 8GB of RAM and turn off the firewall. Then, you just need to type a few lines:

dnf module enable -y cri-o:1.21
dnf install -y cri-o cri-tools
systemctl enable crio -now
dnf copr enable -y @redhat-et/microshift
dnf install -y microshift
systemctl enable microshift -now

While the service starts and fetches the required container images from the Internet, you can download the oc and kubectl clients:

curl -O https://mirror.openshift.com/pub/open-shift-v4/$(uname -m)/clients/ocp/stable/openshift-client-linux.tar.gz
tar -xf openshift-client-linux.tar.gz -C /usr/local/bin oc kubectl

To check whether the back-end services are running, the command

crictl ps

should return a list of containers. If this is the case, copy the configuration file to your home directory:

mkdir ~/.kube
cat /var/lib/microshift/resources/kubeadmin/kubeconfig > ~/.kube/conf

If you want to manage your Kubernetes server from another system (e.g., a Windows workstation), first copy the specified kubeconfig file to the client system, load the file into an editor, and look for the line:

server: https://127.0.0.1:6443

Replace 127.0.0.1 with the external IP address of your VM. The line occurs several times in the kubeconfig file. Next, install the kubectl and oc client tools you need for your workstation. You can then control your Microshift VM from your workstation.

Figure 1: MariaDB StatefulSet for Kubernetes creates persistent storage and starts the database container. A web front end with phpMyAdmin runs in the namespace.
Figure 2: The phpMyAdmin front end in the Kubernetes main space of the database uses the service to communicate with the MariaDB pod.

Lines 1-14 declare the database port as a service. This definition can then be used as an easy way to connect other applications to the database server. Because this example is a single-node cluster and you want to address the MariaDB container directly, you need to create the service as a NodePort. Kubernetes then generates an automatic port mapping (Figure 3).

Figure 3: In a single-node Kubernetes structure, NodePorts allow access to applications in containers over the local area network; Kubernetes clusters use routes instead.

The file continues with a ConfigMap (lines 16-26). The values specified as data correspond to the environment variable declarations (-e) that you pass in at the Docker or Podman command line. However, the ConfigMap data is in plain text in the Kubernetes configuration. In a production environment, you would define the passwords separately as Secret for encryption and security.

Now it's time for the StatefulSet itself (lines 28-37). This section declares the Kubernetes pod with its name and the number of replicas. Because MariaDB in this configuration does not support active-active mirroring, the pod only has one replica, so StatefulSet makes sure that exactly one pod is running at any given time. If it crashes for any reason, Kubernetes automatically starts a new pod within seconds (Listing 2).

Listing 2

MariaDB Pod Reboot

01 template:
02   metadata:
03     labels:
04       app: mariadb
05   spec:
06     containers::
07     - name: mariadb
08       image: mariadb:latest
09       ports:
10       - containerPort: 3306
11         name: mariadb
12       volumeMounts:
13       - name: mariadb
14         mountPath: /var/lib/mysql
15       envFrom:
16       - configMapRef:
17         name: mariadb

A pod can contain one or more containers that always run together and cannot scale separately. Mostly, however, a pod comprises a single container (i.e., mariadb:latest here); alternatively, the version number might be stated. You could optionally specify quota rules at this point (i.e., the RAM size and CPU shares the container will be given) as maximum or minimum values. This pod retrieves its environment variables from the ConfigMap declared earlier.

At this point, the volume mount point for the persistent volume (PV) is important. With volumeClaimTemplates (lines 39-46), StatefulSet automatically generates a PV claim, which in turn creates the PV and binds it to the pod. To run StatefulSet, simply create a namespace (projects) and run the specified YML file:

oc new-project mysql
oc create -f mariadb-state.yml

The command is

kubectl create namespace mysql
kubectl apply -f mariadb-state.yml

if you only use the kubectl client.

PostgreSQL and Microsoft SQL

You can also use exactly the same principle to create stateful sets for PostgreSQL or Microsoft SQL servers in containers. For PostgreSQL, use port 5432 instead of 3306 and the postgresql:latest or postgresql:14 image and type the following lines into ConfigMap:

data:
   POSTGRES_DB: postgresdb
   POSTGRES_USER: admin
   POSTGRES_PASSWORD: test123

Microsoft delivers two different images for the SQL server. The link mcr.microsoft.com/mssql/server picks up the current SQL server image on an Ubuntu basis from the in-house registry, whereas the link mcr.microsoft.com/mssql/rhel/server gets a SQL server on an RHEL 8 (UBI8=universal base image 8) basis. Microsoft (MS) SQL server uses port 1433. You have to pay attention to one small detail in the config map:

data:
   ACCEPT_EULA: "Y"
   SA_PASSWORD: "<LongPassword>"
   MSSQL_PID: "Developer"

MS SQL server requires a long password with eight digits or more. If you only specify test123 here, as in the PostgreSQL example, the MS software will stop the container immediately after it is started and you end up in a crash loop. The ACCEPT_EULA variable is also mandatory.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus