ClusterHAT
When I started in high-performance computing (HPC), the systems were huge, hulking beasts that were shared by everyone. The advent of clusters allowed the construction of larger systems accessible to more users. I always wanted my own cluster, but with limited funds, that was difficult. I could build small clusters from old, used systems, but the large cases took up a great deal of room. The advent of small systems, especially single-board computers (SBCs), allowed the construction of small, low-power, inexpensive, but very scalable systems.
Arguably, the monarch of the SBC movement is the Raspberry Pi. It is now the third best selling computer of all time, overtaking the Commodore 64 and behind the PC and the Mac, and has sparked a whole industry around small, inexpensive, low-power but expandable computers that can be used for anything from sensors in the field, to desktops, to retro game consoles, and even to experiments on the International Space Station. The top-end Raspberry Pi, the Raspberry Pi 3 (RPi3), is about $35, and the introduction of the Raspberry Pi Zero (Pi Zero) in 2015, set the low-end price of $5.
People have been building clusters from Raspberry Pi units, starting with the original Raspberry Pi Model A, ranging from two to more than 250 nodes. That early 32-bit system had a single core running at 700MHz with 256MB of memory. You can build a cluster of five RPi3 nodes with 20 cores connected by a Gigabit Ethernet switch for about $300, including a case and case fan.
Fairly recently, a company created a Hardware Attached on Top (HAT) add-on board that you can add to a single “host” RPi2 or 3. It has four micro-USB slots that each connect to a Pi Zero. It provides both power and a network between the Pi Zeros and the host node. The ClusterHAT fits into the GPIO pins on the host (master node) and accepts up to four Pi Zeros in mini-USB ports.
The ClusterHAT kit is a little over $25 and includes the HAT board, four standoffs, a handy USB cable, and some plastic feet if you want to put them on the bottom of your host node (Figure 1). Putting everything together is very easy, and you can watch a video on the ClusterHAT website to show how it’s done.
After threading the USB cable between the HAT and the RPi3, attaching the HAT, and snapping the Pi Zero boards into the HAT, you should have something that looks like the Figure 2. Next, attach a keyboard, a mouse, an external power supply, and a monitor (Figure 3). Notice that the Pi Zeros are powered on in this image (i.e., the lights near the boards are lit).
The RPi2 or 3 needs a good power supply capable of 2 to 2.5A and at least one microSD card for the master node. You can put a card in each Pi Zero, if you want, or you can NFS boot each one (although that’s a little experimental). I put a 16GB microSD card into each of the five Raspberry Pis.
The ClusterHAT site has created Raspbian Jessie-based software images that have been configured with a few simple tools for the ClusterHAT. Jessie is a little different from past Raspbian versions, and the biggest difference that affects the ClusterHAT is the use of DHCP by default.
For this article, the target cluster configuration uses an RPi3 as the master node and Pi Zeros in the ClusterHAT as compute nodes. The master node will be an NFS server, with /home and /usr/local, for the four Pi Zeros. Additionally, the cluster will use passwordless SSH and pdsh, a high-performance, parallel remote shell utility. MPI and GFortran will be installed for building MPI applications and testing.
At this point, the ClusterHAT should be assembled and the operating system (OS) images copied to the microSD cards for the five nodes. The next steps are to boot the images for the first time on the RPi3 and configure them to meet the previously mentioned target configuration.
Master Node
Because the master node effectively controls the cluster, getting the OS configuration correct is important. Fortunately, only a small number of changes need to be made to the image provided on the website.
The first step is to boot the master node (RPi3) with its microSD card. Be sure it is plugged in to your local network and can access the Internet. After the RPi3 boots, you should be in the Pixel desktop (Figure 4). A few “classic” configurations are called for at this point with the help of the raspi-config command:
- Expand the storage to use the entire microSD card
- Change the default password for the pi account
- Enable SSH (it is no longer enabled by default in Raspbian)
- Switch the keyboard to a US keyboard (by default, Raspbian uses a UK keyboard)
After configuring, you need to install the NFS server packages on the RPi3 (master node):
$ apt install nfs-common nfs-kernel-server
Next, the NFS exports, a list of filesystems to be exported, needs to be created. The file /etc/exports should be created or edited to include the following:
/home *(rw,sync,no_subtree_check) /usr/local *(rw,sync,no_subtree_check)
Notice that the filesystems are exported globally with the “*” wildcard. Although this is usually not a good idea, because anyone could mount the filesystems, I almost always run the system with no Internet access, so I’m not too worried. To make things safer, you can specify an IP range that can mount the filesystems.
To ensure that the NFS server starts when the RPi3 is booted, run following commands:
$ sudo update-rc.d rpcbind enable $ sudo /etc/init.d/rpcbind start $ sudo /etc/init.d/nfs-kernel-server restart
Note that these commands need to be run whenever the master node is rebooted.
The filesystems can be exported and checked to make sure they are actually exported:
$ sudo exportfs -ra $ sudo export's /home /usr/local
Next, SSH is configured so that passwordless logins can be used on the cluster. Many tutorials on the web explain how to accomplish this.
After SSH, both GFortran, the GCC Fortran compiler, and MPICH, a high-performance implementation of the Message Passing Interface (MPI) standard, are installed using apt:
$ apt install gfortran mpich
The HPC world uses tools that allow commands to run across the entire cluster or a subset of nodes in the cluster. My preferred “parallel shell” tool is pdsh. Consult one of my previous articles for directions on how to build, install, and use pdsh. I installed pdsh for the ClusterHAT in /usr/local. A /home/pi/PDSH directory was created with a /home/pi/PDSH/hosts file that lists the default nodes to be addressed by pdsh when no nodes are specified. For the ClusterHAT, the list of nodes is:
p1.local p2.local p3.local p4.local
As an option, controller.local could be added to the list if the master node is to be a default target for commands.
Configuring Compute Nodes
The ClusterHAT site provides several ways to build images for the compute nodes. I took the easier route and downloaded five images – one for the controller (head node or master node), and one each for the four compute nodes – and copied them to a microSD card for each node. By taking this approach, each image could be booted in the RPi3 so that changes could be made before booting the entire cluster.
Unlike the master node, the compute nodes do not boot into the desktop; they just boot to the command line so you can log in to the node.
Booting each image allows you to make some basic changes that are needed on the first boot of a Raspbian system. First, the command raspi-config must be used, as discussed for the master node, to extend the filesystem to use the entire microSD card and enable SSH (see the bulleted list above). The third action, changing the password, should be done on all the compute node images; they should have the same password, but not the default, raspberry.
To make life easier, I like to have my clusters share a common /home for the users and /usr/local/ for applications shared by the nodes. The ClusterHAT cluster is no exception: I want to mount /home and /usr/local from the master node to all of the compute nodes. I added the following line to /etc/fstab on all of the compute node images:
controller.pi:/home /home nfs default 0 0 controller.pi:/usr/local /usr/local nfs default 0 0
Also installed on each compute node are gfortran and mpich; these were installed on the master node from the Raspbian repositories, so they are not installed in /usr/local or /home; consequently, they have to be installed on each node.
By default, the Pi Zero nodes are named p1.local, p2.local, p3.local, and p4.local. If you look at the ClusterHAT from above, at one end you can see the labels p1, p2, p3, and p4, for the four slots. The master node has the node name controller.local.
After setting up the master node, the ClusterHAT, and the compute nodes, it's time for the first boot!