Best practices when working with Docker images
Shipshape
Containers have become the most convenient method of deploying complex applications. This convenience is shared both by developers, who can comfortably work in the same environment as the final deployment, and by the staff responsible for installing, updating, monitoring, and troubleshooting the application, whether they are traditional sys admins or DevOps.
Although several container solutions are present on the market, with varying levels of features, maturity, and flexibility, Docker is by far the most popular. In this article, I offer a couple of useful tips to make your work with Docker images and containers more efficient.
Be Careful with the "latest" Tag
Despite its name, latest doesn't mean the image tagged as such is actually its most recent version – it's just the default value assigned to images that don't have a tag assigned. Therefore, if you create an image without tagging it, it will be tagged latest – unless you specify another tag. Therefore, if you create new images with other tags, latest will not be the most recent.
Moreover, because latest is just a shortcut for the default value, it can be easily overwritten by other team members; that is, if you don't tag your images, each new image pushed to the registry will have the latest tag automatically applied, which might create confusion.
If you don't explicitly specify an image tag, Kubernetes implicitly assumes it's latest , but because tags are mutable and latest is easily overwritten, it's hard to say what image is really deployed. What is more, if you set imagePullPolicy to Always and one of your pods dies, Kubernetes will pull the latest image, possibly resulting in a different image than in the other pods. That itself is bad enough and might become worse if your latest image is broken.
What to use instead? Fragments of Git hashes work perfectly and are a good fit for automated continuous integration and continuous deployment (CI/CD) pipelines. For humans, semantic versioning (MAJOR.MINOR.PATCH ) might work better. You can even use both, pushing the image with multiple tags. In any case it's crucial that you treat image tags as immutable and permanently assigned to a given build. When a build changes, change the tag. The point is, to deploy a new version or roll back to the previous one, you need to have two distinct versions that you can reliably identify as such – and the latest tag is a bad fit for that.
That said, if you are a solo developer and are using images locally, latest can be a useful shortcut and you can work with it without ever encountering a problem.
One Service per Container
Many typical applications comprise more than one service. You often have a front end with JavaScript, a back end with a framework like the Django REST framework (DRF) or something similar communicating with the front end, and some kind of persistent data store, whether SQL or NoSQL. It would be tempting to pack it up together in one container – but that's exactly what you shouldn't do.
One of the many benefits of decoupling the above components is that it's easier to work with the app as a team. Moreover, when you deploy the application, each container can be monitored and controlled separately so that when it crashes, it can be restarted automatically. Also, updates are easier. Therefore you should always separate services.
Keep It Small
As a general rule, your images should be as small as possible for several reasons: You save space, reduce transfer times, and (most importantly) reduce the startup time of the container during deployments – at the same time reducing the number of vulnerabilities (discussed later).
You also need to remember that several commands, such as FROM
, COPY
, and RUN
create additional layers. You can reduce the number of layers by chaining several separate commands into one long multiline command connected with &&
. Although you can get several useful pieces of information on an image with docker inspect
, docker history
is a handy alternative when it comes to layer sizes (Figure 1). It also conveniently displays the commands used to create each layer.
What if you need an extensive environment to build an image? In many cases the so-called multistage builds can help. To understand their usefulness, have a look at Listing 1, which contains a simple C command-line interface (CLI) application; when you compile it with GCC (Listing 2), the resulting size will be less than 30KB.
Listing 1
A Simple C Program
#include <stdio.h> int main () { printf("Hello, world!\n"); }
Listing 2
Dockerfile Build
FROM gcc COPY hello.c . RUN gcc -o hello hello.c CMD ["./hello"]
Now see what happens if you build a Docker container with the same program by creating a file named Dockerfile
with the content of Listing 2 and placing it in the same directory as the C source. Once you've built it with
docker build -t chello:v1
you will notice (e.g., by issuing the
docker images | grep chello | awk '{print $7}'
command) the size of the image is more than 1GB!
Fortunately, the problem is easy to fix because you don't need the whole GCC environment in the running container. It is only needed for the build, so you can easily discard most of it – thus, the purpose of multistage builds: In the first stage you prepare your build environment, proceed with the build process, and then use the resulting artifacts in the next stages.
In Listing 3, you will notice two FROM
directives, the first named stage_one
. It then proceeds with the same COPY
and RUN
commands as before. However, the second (and in this case, final) FROM
command marks the beginning of another stage. This time, instead of the large GCC image, it starts with Debian. The COPY
instruction uses the special option --from=stage_one
, meaning the artifact should be copied from the stage named stage_one
. Because it copies only one file, the resulting image is an order of magnitude smaller – it is 124MB.
Listing 3
Multistage Build
FROM gcc AS stage_one COPY hello.c . RUN gcc -o hello hello.c FROM debian COPY --from=stage_one hello . CMD ["./hello"]
This is still a lot, though. You can make it smaller if you use the lightweight Alpine alternative instead of Debian, with one caveat: Instead of the GNU C library (glibc
), Alpine uses a lightweight musl
library, so if you just put alpine
in place of debian
in Listing 3, you will get an ambiguous file not found error
. In this case, you can solve it by building your program as a static application, passing the -static
option to GCC. If you are adventurous, you can even use the scratch
virtual image, but it's not recommended for practical purposes (no shell or other features for diagnostics).
Buy this article as PDF
(incl. VAT)