« Previous 1 2 3
Detect failures and ensure high availability
On the Safe Side
Installing and Configuring Corosync/Pacemaker
On both nodes, install the following Corosync and Pacemaker packages (all required dependencies will be installed):
$ sudo aptitude install pacemaker crmsh
On the primary node, generate a Corosync cluster key:
$ sudo corosync-keygen Corosync Cluster Engine Authentication key generator. Gathering 2048 bits for key from /dev/urandom. Writing corosync key to /etc/corosync/authkey.
Copy the newly created key over to the secondary node:
$ sudo scp /etc/corosync/authkey ubu22042-2:/etc/corosync/
On both nodes, change the permissions to read-only for the file owner (root):
$ chmod 400 /etc/corosync/authkey
Again, on both nodes, modify /etc/corosync/corosync.conf
to look like Listing 5, making the proper modifications to the hostnames and IP addresses. The bindnetaddr
field should be unique on each node; in this case, it should display the IP address of the server hosting the file.
Listing 5
/etc/corosync/corosync.conf
§§nonuber totem { version: 2 cluster_name: drbd-cluster crypto_cipher: none crypto_hash: none transport: udpu interface { ringnumber: 0 bindnetaddr: 10.0.0.62 broadcast: yes mcastport: 5405 } } quorum { provider: corosync_votequorum two_node: 1 } logging { to_logfile: yes logfile: /var/log/corosync/corosync.log to_syslog: yes timestamp: on } nodelist { node { ring0_addr: 10.0.0.216 name: ubu22042-1 nodeid: 1 } node { ring0_addr: 10.0.0.62 name: ubu22042-2 nodeid: 2 } } service { name: pacemaker ver: 0 }
On both nodes, enable and start both Corosync and Pacemaker services:
$ sudo systemctl enable corosync $ sudo systemctl start corosync $ sudo systemctl enable pacemaker $ sudo systemctl start pacemaker
The cluster should now be up and running. On any node, list the status of the cluster (Listing 6). You should observe both nodes listed with an Online status.
Listing 6
Cluster Status
§§nonuber $ sudo crm status Cluster Summary: * Stack: corosync * Current DC: ubu22042-2 (version 2.0.3-4b1f869f0f) - partition with quorum * Last updated: Sat Apr 8 17:35:53 2023 * Last change: Sat Apr 8 17:35:10 2023 by hacluster via crmd on ubu22042-2 * 2 nodes configured * 0 resource instances configured Node List: * Online: [ ubu22042-1 ubu22042-2 ] Full List of Resources: * No resources
For the purposes of this experiment, disable the STONITH (Shoot The Other Node In The Head) fencing technique and ignore the quorum state of the cluster:
$ sudo crm configure property stonith-enabled=false $ sudo crm configure property no-quorum-policy=ignore
Now verify that these properties were set (Listing 7).
Listing 7
Properties
§§nonuber $ sudo crm configure show node 1: ubu22042-1 node 2: ubu22042-2 property cib-bootstrap-options: have-watchdog=false dc-version=2.0.3-4b1f869f0f cluster-infrastructure=corosync cluster-name=drbd-cluster stonith-enabled=false no-quorum-policy=ignore
Fortunately, the DRBD project has been around for long enough that resource agents exist to manage the DRBD volumes. To list the Open Cluster Framework (OCF) DRBD-defined resource agents (for Pacemaker), enter:
$ crm_resource --list-agents ocf|grep -i drbd drbd drbd.sh
To finish the cluster configuration, enter the Cluster Resource Manager (CRM) interactive shell:
$ sudo crm configure
Next, enable the configurations in Listing 8, making the proper modifications to account for the disk device, mount directory, and filesystem type.
Listing 8
Enable Config
§§nonuber crm(live/ubu22042-1)configure# primitive drbd_res ocf:linbit:drbd params drbd_resource=r0 op monitor interval=29s role=Master op monitor interval=31s role=Slave crm(live/ubu22042-1)configure# ms drbd_master_slave drbd_res meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true crm(live/ubu22042-1)configure# primitive fs_res ocf:heartbeat:Filesystem params device=/dev/drbd0 directory=/srv fstype=ext4 crm(live/ubu22042-1)configure# colocation fs_drbd_colo INFINITY: fs_res drbd_master_slave:Master crm(live/ubu22042-1)configure# order fs_after_drbd mandatory: drbd_master_slave:promote fs_res:start
To commit the configuration, enter:
crm(live/ubu22042-1)configure# commit
Now verify the configuration (Listing 9), and quit the CRM shell:
crm(live/ubu22042-1)configure# quit bye
Listing 9
Verify and Validate Config
§§nonuber crm(live/ubu22042-1)configure# show node 1: ubu22042-1 node 2: ubu22042-2 primitive drbd_res ocf:linbit:drbd params drbd_resource=r0 op monitor interval=29s role=Master op monitor interval=31s role=Slave primitive fs_res Filesystem params device="/dev/drbd0" directory="/srv" fstype=ext4 ms drbd_master_slave drbd_res meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true order fs_after_drbd Mandatory: drbd_master_slave:promote fs_res:start colocation fs_drbd_colo inf: fs_res drbd_master_slave:Master property cib-bootstrap-options: have-watchdog=false dc-version=2.0.3-4b1f869f0f cluster-infrastructure=corosync cluster-name=drbd-cluster stonith-enabled=false no-quorum-policy=ignore
The cluster should now know about and officially manage access to the DRBD volume created earlier (Listing 10); it is enabled on the primary and now "Active" node.
Listing 10
Cluster Status
§§nonuber $ sudo crm status Cluster Summary: * Stack: corosync * Current DC: ubu22042-2 (version 2.0.3-4b1f869f0f) - partition with quorum * Last updated: Sat Apr 8 17:45:40 2023 * Last change: Sat Apr 8 17:42:15 2023 by root via cibadmin on ubu22042-1 * 2 nodes configured * 3 resource instances configured Node List: * Online: [ ubu22042-1 ubu22042-2 ] Full List of Resources: * Clone Set: drbd_master_slave [drbd_res] (promotable): * Masters: [ ubu22042-1 ] * Slaves: [ ubu22042-2 ] * fs_res (ocf::heartbeat:Filesystem): Started ubu22042-1
Executing the df
or mount
command will confirm that it is mounted at /srv
. The secondary node remains idle and "Passive" until your primary node becomes unavailable. To test, shut down the primary node,
$ sudo shutdown -h now Connection to 10.0.0.216 closed by remote host. Connection to 10.0.0.216 closed.
wait a bit (about 10), and dump the status of the cluster from the secondary node (Listing 11).
Listing 11
Secondary Node Status
§§nonuber $ sudo crm status Cluster Summary: * Stack: corosync * Current DC: ubu22042-2 (version 2.0.3-4b1f869f0f) - partition with quorum * Last updated: Sat Apr 8 17:47:41 2023 * Last change: Sat Apr 8 17:42:15 2023 by root via cibadmin on ubu22042-1 * 2 nodes configured * 3 resource instances configured Node List: * Online: [ ubu22042-2 ] * OFFLINE: [ ubu22042-1 ] Full List of Resources: * Clone Set: drbd_master_slave [drbd_res] (promotable): * Masters: [ ubu22042-2 ] * Stopped: [ ubu22042-1 ] * fs_res (ocf::heartbeat:Filesystem): Started ubu22042-2
Notice now that the primary node is listed as being OFFLINE, and the secondary node has resumed hosting the "failed over" DRBD volume:
$ df|grep drbd /dev/drbd0 20465580 28 19400628 1% /srv
If you list files again,
$ ls -l /srv/ total 20 -rw-r--r-- 1 root root 12 Apr 8 16:37 hello.txt drwx------ 2 root root 16384 Apr 8 16:33 lost+found
you find the same test file you created before.
Summary
As mentioned earlier, the primary objective for enabling a highly available environment is to reduce both single points of failure and service downtime. You definitely can expand on the examples here to work with more supported applications and functions. To learn more about Pacemaker and the resource agents the framework supports, visit the resource agents section of the Linux-HA wiki [2].
Infos
- DRBD: https://linbit.com/drbd/
- Linux-HA resource agents wiki page: http://www.linux-ha.org/wiki/Resource_Agents
« Previous 1 2 3
Buy this article as PDF
(incl. VAT)