Optimizing Docker Image Layers with Multi-Stage Builds

Optimizing Docker Image Layers with Multi-Stage Builds

Jin LarsenBy Jin Larsen
How-To & Fixesdockerdevopscontainerizationdevops-workflowci-cd

Why are your Docker images so much larger than they need to be?

You build a container, push it to a registry, and suddenly you're staring at a 1.5GB image for a simple Go or Node.js application. It's a common frustration. Most developers accidentally include build-time dependencies, compilers, and heavy system libraries in their final production image. This doesn't just waste disk space; it slows down your CI/CD pipelines and increases your attack surface. By using multi-stage builds, you can separate the build environment from the runtime environment, ensuring your production image contains only what's strictly necessary to run your code.

This approach is standard practice in modern DevOps, yet many developers still struggle with the implementation. We'll look at how to strip away the noise, reduce build times, and keep your deployment artifacts lean. It's about moving away from the "everything in one box" mentality toward a more surgical approach to containerization.

How do multi-stage builds work in practice?

The concept is straightforward: you use one image to build your application and a second, much lighter image to run it. Think of the first stage as a cluttered workshop filled with hammers, saws, and scrap metal, while the second stage is just the finished piece of furniture. You don't want the hammer in your living room; you just want the table.

In a typical Dockerfile, you can define multiple FROM instructions. Each FROM instruction starts a new stage. You can name these stages using the AS keyword, which makes the file much more readable. Here is a conceptual example of a Node.js build process:

# Stage 1: The Build Stage
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./ 
RUN npm install
COPY . .
RUN npm run build

# Stage 2: The Production Stage
FROM nginx:alpine
COPY --from=builder /app/dist /usr/share/nginx/html
EXPOSE 80

In the example above, the builder stage contains the entire node image, which includes npm, local caches, and all your devDependencies. However, the final image is just nginx:alpine. The only thing moving from the first stage to the second is the /dist folder. Everything else—the heavy node_modules, the source code, and the build tools—is discarded during the final image creation. This keeps your registry clean and your deployments fast.

Can I use this for compiled languages like Go or Rust?

If you're working with compiled languages, the benefits are even more dramatic. For a Go application, your build stage requires the full Go toolchain, which is quite heavy. But once the binary is compiled, it's just a single executable file. You don't need the Go compiler to run a compiled binary.

A well-structured Go Dockerfile might look like this:

  1. Stage 1: Use golang:1.21-alpine to compile the source code.
  2. Stage 2: Use scratch or a minimal debian-slim image.
  3. Action: Copy only the compiled binary from the first stage to the second.

Using the scratch image is the extreme version of this. scratch is an empty image. It contains zero files, zero libraries, and zero shell access. If your binary is statically linked, it can run on scratch, resulting in an image that is often under 20MB. This is the gold standard for security and efficiency. For more details on the nuances of minimal images, the official Docker documentation provides deep technical specifics on how to structure these stages.

How do I handle build-time secrets safely?

One mistake people make when using multi-stage builds is trying to pass sensitive information (like an API key for a private registry) through build arguments. Even if you use a separate stage, that data can sometimes linger in the intermediate layers of your build cache. While multi-stage builds help keep secrets out of the final image, they don't automatically make your build history perfectly secure.

To handle this properly, use --mount=type=secret if you are using BuildKit. This allows you to mount a secret file during a specific RUN command without the secret being baked into any image layer. This is a much safer way to interact with private package managers or cloud credentials during the build process. You can check out the BuildKit documentation to understand how these advanced mounts work.

By adopting this pattern, you aren't just being "neat." You're building a more secure, faster, and more professional deployment pipeline. It's a small change in your Dockerfile that has a massive impact on the long-term health of your infrastructure.