Secure microservices with centralized zero trust

Inspired

Deployment on Kubernetes

The SPIRE quickstart documentation for Kubernetes [5] is easy to deploy, and the example commands demonstrate the successful delivery and content of an SVID in a simple client workload (also created in the quickstart guide). Don't be put off by the comment that it has been tested on Kubernetes versions 1.13.1, 1.12.4, and 1.10.12, which are all at least five years out of date; I tried it on Kubernetes v1.24 and had no problems, because all of the Kubernetes objects required are completely standard. Simply clone the repo [6] and deploy the YAML files in the quickstart directory.

Your Kubernetes cluster must have a default storage class, because, as you might expect, the spire-server pod stores its SQLite data store on a persistent volume. You'll see that a new namespace called spire is created. Inside that namespace, spire-server is deployed as a statefulset, and spire-agent is deployed as a daemonset, with a pod on each worker node in the cluster. The Unix socket through which pods will access the SPIRE Workload API can be seen at /run/spire/sockets/agent.sock on each Kubernetes node.

However, the quickstart is of limited use because it doesn't provide a reliable way to make the Workload API Unix socket available to your application pods. In a SPIFFE domain, a pod with no access to a Workload API socket is like Jason Bourne with no fisherman to rescue him! Therefore, I recommend bypassing the quickstart and starting with the SPIFFE container storage interface (CSI) driver example [7] straightaway. The SPIFFE CSI example deploys spire-server and spire-agent just like the official quickstart does, but it additionally creates a CSI driver, csi.spiffe.io, implemented by means of another daemonset, spiffe-csi-driver.

The CSI driver connects each node's Workload API Unix socket to any pod that requires it in the form of a volume. If you're familiar with Kubernetes, it might seem more straightforward simply to mount the Unix socket into the pod as a hostPath volume; however, the security policies in many Kubernetes clusters prevent non-privileged pods from hostPath mounting. The CSI method, although more expensive in terms of cluster resources, is at least reliable.

To deploy the SPIRE CSI example onto a Kubernetes cluster, take these steps:

  1. Clone the SPIFFE CSI repo to the host you use for performing Kubernetes admin tasks:

    git clone https://github.com/spiffe/spiffe-csi
  2. Amend spiffe-csi/example/config/spire-server.yaml to add a persistent volume and a corresponding volumeMount for the /run/spire/data mountpoint (optional, but recommended for even the most trivial production use). Unlike the quickstart example for Kubernetes, the SPIFFE CSI example does not specify a persistent volume for the SPIRE Server.
  3. Execute the deployment script:

    spiffe-csi/example/deploy-spire-and-csi-driver.sh

    This action applies all of the YAML files under spiffe-csi/example/config with kubectl. Check the output for any Kubernetes-related errors. At this stage, the most likely cause of any problems is that your Kubernetes context does not have sufficient permissions to create all of the required objects.

  4. Check the content of the spire namespace (Figure 4):

    kubectl get all -n spire
    Figure 4: The content of the spire namespace in a Kubernetes cluster.

    At this stage, it is also interesting to review the contents of the configmaps that were created by the deployment script. To do this, run the command:

    kubectl -n spire get cm spire-agent spire-server -o yaml

    You can clearly identify important configuration options, such as the name of the trust domain, the subject of the built-in CA, and the TTL of the SVIDs. To change them, edit the configmaps, for example, with:

    kubectl -n spire edit cm spire-server

    and restart the relevant pods to make your changes take effect.

  5. If no spiffe-csi-driver pods are running, check the status of the spiffe-csi-driver daemonset:

    kubectl -n spire describe ds spiffe-csi-driver

    The spiffe-csi-driver pods will not be scheduled if the pod security policies in place on your cluster prevent the creation of privileged pods in the spire namespace. That's because they use a hostPath volume mount to connect to the Workload API's Unix socket at /run/spire/sockets on each worker host, and they need to be privileged to do so.

  6. Check that the Workload API's Unix socket has been created on the Kubernetes workers (it won't exist on the masters):

    ls /run/spire/agent-sockets/spire-agent.sock
  7. Check that the spire-agent pods are connected to spire-server and have successfully performed node attestation:

    SPIRE_SERVER_POD=$(kubectl -n spire get po -l app=spire-server -o jsonpath="{.items[0].metadata.name}")
    kubectl -n spire logs $SPIRE_SERVER_POD | grep -B1 attestation

    You will see that the SPIRE Server issued an SVID to the node agent in the form spiffe://example.org/spire/agent/k8s_psat/<cluster name>/<kubernetes node uid>. If you run

    kubectl get nodes -o yaml | grep uid:

    you'll see that the SPIFFE IDs issued to the nodes do indeed match the nodes' Kubernetes universally unique identifiers (UUIDs). made possible by SPIRE's k8s_psat node attestation plugin (Figure 5), which enables the SPIRE server to confirm the identity of the attesting node by querying the Kubernetes API to confirm various aspects of the node's identity. More information about node attestation with projected service account tokens (PSATs) is given in the SPIRE docs [8].

    Figure 5: spire-server logs showing the node attestation process.
  8. Create registration entries for the spire-agents. Later, you'll specify the spire-agent SPIFFE ID as the parent ID for each of the workload registrations you create. This is how we can control which node(s) are allowable for each workload:

    kubectl -n spire $SPIRE_SERVER_POD -- /opt/spire/bin/spire-server entry create -spiffeID spiffe://example.org/ns/spire/sa/spire-agent -parentID spiffe://example.org/spire/server -selector k8s_psat:agent_ns:spire -selector k8s_psat:agent_sa:spire -agent -selector k8s_psat:cluster:example-cluster

    This generic registration will be matched by spire-agent on each Kubernetes worker; therefore, each spire-agent pod will receive the complete bundle of SVIDs for all the workloads whose registrations specify this spire-agent registration as the parent. If workloads were tied to particular nodes, you could use pod affinity to create multiple spire-agent registrations with node-specific k8s_psat selectors (e.g., -selector k8s_psat:agent_node_name) and set those as the workloads' parent IDs, so the workloads could only attest successfully when running on the correct nodes.

    At this point, the SPIRE infrastructure is ready for use, and workloads can be deployed by the sequence shown in Figure 3.

Register and Deploy Workloads

The example application to be deployed secures communication between a client workload and a server workload by SVIDs that are generated from the infrastructure just installed. This simple example is written in Go and uses SPIRE's Go library. The server waits for an incoming connection; when it receives one, it requests its own SVID and then checks that the TLS certificate of the inbound client request contains the SPIFFE ID it's been configured to expect. Meanwhile, the client repeatedly requests its own SVID, then sends an HTTPS GET request to the server, checking that the server presents a TLS certificate matching the expected SPIFFE ID. To deploy the example, take these steps:

  1. Clone the repository containing the example:

    git clone https://github.com/datadoc24/golang-spire-examples.git

    This repository contains the Golang source code and Dockerfile in the example directory and a YAML file for Kubernetes deployment in the k8s directory.

  2. Apply the YAML file to your default namespace:

    kubectl apply -f golang-spire-examples/k8s/spire-https.yaml
  3. Register the workloads with the SPIRE Server. Suitable registration commands for the two workload pods, along with the spire-agent registration used in the infrastructure example, are in golang-spire-examples/k8s/reg.sh.
  4. Check that the spire-https-client and spire-https-server pods are running in your default namespace, and see on which nodes Kubernetes deployed them. Tail the logs of the spire-agent pod on one of those nodes, and see the workload attestation process (Figure 6).

    Figure 6: spire-agent logs generated by the workload attestation process.
  5. Tail the logs of the client pod with

    kubectl logs -l app=spire-https-client -f

    You'll see that it is sending requests to the server and that the data specified in the server pod spec's DATA_TO_SEND environment variable is received by the client. The logs also print out the .pem content of the client pod's SVID, which was shown in Figure 2.

  6. The acid test. You want to be sure that communication between the client and the server will break down if either side's SVID expires and is not renewed. The following tests are a good way to prove that: * Stop the SPIRE Server. Communication should break when the SPIRE Agents' SVIDs expire.

    * List all registrations and delete one of the workload registrations with:

    spire-server entry show
    spire-server entry delete -entryID <ID of registration to be deleted>

    * Finally, edit the spire-https-client or spire-https-server deployment to change the expected SPIFFE ID of the other workload. For example, run

    kubectl edit deploy spire-https-client

    and change the value of the SERVER_SPIFFE_ID environment variable. Saving the edited deployment will automatically recreate the client workload using the updated value. Tailing the logs of the recreated client workload pod will show you that the mTLS connection is now failing.

Troubleshooting

The logs of the spire-agent pods are an excellent source of debugging information and visibility into the information provided by the attestor plugins. In these, you can see whether node attestation was successful and whether the agents themselves successfully received a bundle from the SPIRE Server, which tells you if the spire-agent registrations were created correctly.

You can also see whether workloads are contacting the spire-agent through the workload API and are receiving the correct SVID. Search in the spire-agent pod logs for the message PID attested to have selectors . Absence of these messages suggests that communication on the Workload API socket is not set up correctly. When an SVID is delivered to a workload, the logs will show the SVID's SPIFFE ID. Check for these points:

  • The hostPath volume for the Workload API socket must be identical between the spire-agent and spire-csi driver pods. Check the volume in both daemonsets to make sure it matches.
  • Pod processes (spire-agent and user workloads) must use a local Workload API socket path that matches the volumeMounts path, which maps the hostPath socket into the pod.
  • The registrations created via the Registration API must have a sufficiently specific combination of selectors to ensure that each workload is correctly identified by the attestation process. If not, your workload might have received an SVID intended for another workload, and will therefore not be trusted by other workloads that are set up to check SPIFFE IDs when establishing TLS connections.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus