Photo by Ilse Orsel on Unsplash

Photo by Ilse Orsel on Unsplash

Best practices when working with Docker images

Shipshape

Article from ADMIN 74/2023
By
Whether you are developing containerized applications or running them, observing best practices helps to obtain optimal results.

Containers have become the most convenient method of deploying complex applications. This convenience is shared both by developers, who can comfortably work in the same environment as the final deployment, and by the staff responsible for installing, updating, monitoring, and troubleshooting the application, whether they are traditional sys admins or DevOps.

Although several container solutions are present on the market, with varying levels of features, maturity, and flexibility, Docker is by far the most popular. In this article, I offer a couple of useful tips to make your work with Docker images and containers more efficient.

Be Careful with the "latest" Tag

Despite its name, latest doesn't mean the image tagged as such is actually its most recent version – it's just the default value assigned to images that don't have a tag assigned. Therefore, if you create an image without tagging it, it will be tagged latest – unless you specify another tag. Therefore, if you create new images with other tags, latest will not be the most recent.

Moreover, because latest is just a shortcut for the default value, it can be easily overwritten by other team members; that is, if you don't tag your images, each new image pushed to the registry will have the latest tag automatically applied, which might create confusion.

If you don't explicitly specify an image tag, Kubernetes implicitly assumes it's latest , but because tags are mutable and latest is easily overwritten, it's hard to say what image is really deployed. What is more, if you set imagePullPolicy to Always and one of your pods dies, Kubernetes will pull the latest image, possibly resulting in a different image than in the other pods. That itself is bad enough and might become worse if your latest image is broken.

What to use instead? Fragments of Git hashes work perfectly and are a good fit for automated continuous integration and continuous deployment (CI/CD) pipelines. For humans, semantic versioning (MAJOR.MINOR.PATCH ) might work better. You can even use both, pushing the image with multiple tags. In any case it's crucial that you treat image tags as immutable and permanently assigned to a given build. When a build changes, change the tag. The point is, to deploy a new version or roll back to the previous one, you need to have two distinct versions that you can reliably identify as such – and the latest tag is a bad fit for that.

That said, if you are a solo developer and are using images locally, latest can be a useful shortcut and you can work with it without ever encountering a problem.

One Service per Container

Many typical applications comprise more than one service. You often have a front end with JavaScript, a back end with a framework like the Django REST framework (DRF) or something similar communicating with the front end, and some kind of persistent data store, whether SQL or NoSQL. It would be tempting to pack it up together in one container – but that's exactly what you shouldn't do.

One of the many benefits of decoupling the above components is that it's easier to work with the app as a team. Moreover, when you deploy the application, each container can be monitored and controlled separately so that when it crashes, it can be restarted automatically. Also, updates are easier. Therefore you should always separate services.

Keep It Small

As a general rule, your images should be as small as possible for several reasons: You save space, reduce transfer times, and (most importantly) reduce the startup time of the container during deployments – at the same time reducing the number of vulnerabilities (discussed later).

You also need to remember that several commands, such as FROM, COPY, and RUN create additional layers. You can reduce the number of layers by chaining several separate commands into one long multiline command connected with &&. Although you can get several useful pieces of information on an image with docker inspect, docker history is a handy alternative when it comes to layer sizes (Figure 1). It also conveniently displays the commands used to create each layer.

Figure 1: Sample output of docker history.

What if you need an extensive environment to build an image? In many cases the so-called multistage builds can help. To understand their usefulness, have a look at Listing 1, which contains a simple C command-line interface (CLI) application; when you compile it with GCC (Listing 2), the resulting size will be less than 30KB.

Listing 1

A Simple C Program

#include <stdio.h>
int main () {
   printf("Hello, world!\n");
}

Listing 2

Dockerfile Build

FROM gcc
COPY hello.c .
RUN gcc -o hello hello.c
CMD ["./hello"]

Now see what happens if you build a Docker container with the same program by creating a file named Dockerfile with the content of Listing 2 and placing it in the same directory as the C source. Once you've built it with

docker build -t chello:v1

you will notice (e.g., by issuing the

docker images | grep chello | awk '{print $7}'

command) the size of the image is more than 1GB!

Fortunately, the problem is easy to fix because you don't need the whole GCC environment in the running container. It is only needed for the build, so you can easily discard most of it – thus, the purpose of multistage builds: In the first stage you prepare your build environment, proceed with the build process, and then use the resulting artifacts in the next stages.

In Listing 3, you will notice two FROM directives, the first named stage_one. It then proceeds with the same COPY and RUN commands as before. However, the second (and in this case, final) FROM command marks the beginning of another stage. This time, instead of the large GCC image, it starts with Debian. The COPY instruction uses the special option --from=stage_one, meaning the artifact should be copied from the stage named stage_one. Because it copies only one file, the resulting image is an order of magnitude smaller – it is 124MB.

Listing 3

Multistage Build

FROM gcc AS stage_one
COPY hello.c .
RUN gcc -o hello hello.c
FROM debian
COPY --from=stage_one hello .
CMD ["./hello"]

This is still a lot, though. You can make it smaller if you use the lightweight Alpine alternative instead of Debian, with one caveat: Instead of the GNU C library (glibc ), Alpine uses a lightweight musl library, so if you just put alpine in place of debian in Listing 3, you will get an ambiguous file not found error . In this case, you can solve it by building your program as a static application, passing the -static option to GCC. If you are adventurous, you can even use the scratch virtual image, but it's not recommended for practical purposes (no shell or other features for diagnostics).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus