Containers

Container Security

Hardening practices that reduce the attack surface of container builds, images, and runtime settings.

Container security addresses threats that exist across the full lifecycle , from the packages that enter the image to the privileges granted at runtime to the trust placed in external registries. Attackers who compromise a container aim to break out of its isolation, access secrets, or pivot to other workloads; good container security limits what they can do if they get in. The model starts at build time and extends all the way through deployment and runtime operation.

Learning objectives

What you should be able to do after reading.
  • Identify the main container security controls across build time and runtime.
  • Explain why image size, build structure, and privilege level matter.
  • Recognize the most common ways containers are misconfigured.

At a glance

Fast mental model before you dive in.
Build hygiene
  • Small base images
  • Multi-stage builds
  • SBOMs
Runtime limits
  • Non-root users
  • Rootless mode
  • Scoped mounts
Supply-chain proof
  • Image scanning
  • Image signing
  • Trusted registries

Core idea

A secure container is narrow, explicit, and easy to inspect. Narrow means the image contains only what the application needs to run. Explicit means runtime privileges, capabilities, and mounts are declared and justified. Easy to inspect means no hidden layers, no undocumented dependencies, and no surprise behavior when the container starts.

The principle of least privilege applies at every layer. The base image should have minimal packages, the process should run as a low-privilege user, the container should have only the Linux capabilities it genuinely needs, and the mounts and network paths should be scoped to what the workload actually uses. The goal is to make the container a poor environment for an attacker even if they gain code execution inside it.

Defense in depth for containers means that no single control is relied on exclusively. Image scanning, signing, runtime security policies, and admission controls are independent layers. If one is bypassed or fails, others still limit the damage. This stacked model is what 'keeping the blast radius small' actually looks like in practice.

The most important mental shift is treating security as a build property, not an operational afterthought. A container that enters production insecure cannot be made secure by monitoring alone. The controls that matter most, image hygiene, build discipline, and runtime policy, must be in place before the container is deployed.

Build-time controls

  • Start from a minimal base image (slim, alpine, distroless) and add only what is genuinely required, every extra package is a potential vulnerability.
  • Use multi-stage builds to compile or prepare artifacts in a build environment and copy only the final result into the runtime image, keeping build tools out of production.
  • Generate and retain an SBOM (software bill of materials) for each image so you know exactly what packages and versions are inside.
  • Never write secrets, tokens, or credentials into image layers, not in ENV, ARG, RUN, or COPY instructions.

Runtime controls

  • Run as a non-root user inside the container unless there is a specific, documented reason not to.
  • Use rootless container runtimes (such as Podman or rootless Docker) where the platform supports it, to reduce the privileges of the runtime itself.
  • Set a read-only root filesystem and mount writable paths only where the application explicitly needs to write.
  • Drop all Linux capabilities not required by the application and never use privileged: true in production unless the workload genuinely needs kernel-level access.

Signals to watch for

Patterns worth investigating further.
  • Secrets or credentials copied into image layers.
  • A container that can write to more of the filesystem than it needs.
  • Images pulled from untrusted or unreviewed sources.
  • Build steps that leave compilers, package managers, or debug tools in production images.

DEEP DIVE

Minimal images

Every package in a container image is a potential vulnerability. A full Ubuntu or Debian image ships with a shell, package managers, network utilities, and hundreds of other binaries that a typical application never uses. Any of these can be leveraged by an attacker who achieves code execution to escalate privileges, establish persistence, or move laterally. Minimal images remove the tools the attacker would reach for.

The spectrum runs from full-fat images (Ubuntu, Debian, CentOS) through slimmed distro images (debian-slim, alpine) to distroless images (containing only the language runtime and standard libraries) and finally to scratch-based images (containing only the application binary and its direct dependencies). Each step reduces both image size and attack surface, but also increases the effort required to debug running containers.

A practical tool for understanding what is in an image is 'dive', which lets you inspect each layer and identify files that are unexpectedly large, hidden by later layers, or simply unnecessary. Many teams are surprised to discover that a layer buried deep in their image contains the entire source code checkout, a package manager cache, or an undeleted credential file from a build step.

The trade-off against minimal images is debuggability. When a production container misbehaves, having a shell inside makes investigation easier. The modern answer to this is ephemeral debug containers (kubectl debug in Kubernetes, docker debug locally), which attach a separate debug-enabled container to a running minimal container without changing the production image itself.

Multi stage builds

A multi-stage build uses multiple FROM instructions in a single Dockerfile. Each FROM starts a new build stage with its own filesystem. You compile or prepare the application in an early stage (with a full compiler toolchain, package manager, and build dependencies), then COPY only the final artifact into a clean runtime stage. The runtime stage has no compiler, no package manager, and no build cache, just what is needed to run.

This matters for security because an attacker who gains code execution inside the container cannot use tools that are not there. In a Go application, for example, the build stage has the Go compiler, gcc, and potentially dozens of build-time dependencies. The runtime stage has just the compiled binary. If the application is compromised, the attacker cannot use apt-get to install tools, cannot compile new binaries, and has no shell unless the base image provides one.

Multi-stage builds also prevent accidental secret leakage. Build stages often need credentials to access private package registries or internal APIs. If the build runs in a single stage, those credentials may linger in environment variables or intermediate files. In a multi-stage build, only explicitly COPY-ed files cross the stage boundary. Credentials passed to the build stage do not automatically appear in the runtime image.

The pattern is standard practice for compiled languages (Go, Rust, Java, .NET) but is equally useful for interpreted languages. A Node.js multi-stage build can run npm install in a stage with the full npm toolchain, then copy only node_modules and the application source into a node:alpine or distroless runtime stage. Excluding devDependencies, npm itself, and the npm cache.

Non root and rootless

Running a process as root inside a container means the process has UID 0 with a broad set of Linux capabilities within its namespace. While container isolation prevents that from being directly equivalent to host root, it significantly reduces the attacker's work if isolation breaks. Many container escape exploits, including kernel vulnerabilities and socket mount attacks, are easier or only possible when the container process runs as root.

Setting a non-root user is straightforward in a Dockerfile. Use the USER instruction to switch to a named user or UID. The application must be able to run as that user, which means file permissions in the image need to be set correctly and the process cannot bind to ports below 1024 without additional capability grants. These constraints are rarely difficult to meet and are almost always worth the effort.

Rootless mode goes further than running as non-root inside the container. In rootless Docker or Podman, the container runtime itself runs as an unprivileged user on the host. It is not started by root and does not require root to function. This means that even if the runtime is compromised, the attacker starts from a non-privileged position on the host. The trade-off is that some features (binding to low ports, some storage drivers, certain network modes) are limited or unavailable.

A common misunderstanding is that non-root inside the container makes the host safe from compromise. It reduces risk significantly but is not a complete boundary. A container running as UID 1000 inside but mapped to a privileged host user through UID remapping, or a container with added Linux capabilities, can still be dangerous. Non-root is one layer in the defense model, not the whole model.

Secrets in containers

The most dangerous place to put a secret is in an image layer, because image layers are immutable, widely cached, and often pulled by many systems. Even if a secret is set in an early layer and a later layer attempts to delete it, the secret remains in the image history and is visible via 'docker history' or by extracting the image filesystem. Credentials stored this way have a long effective lifetime even after rotation, because old images may persist in registries and build caches.

Build-time secrets (credentials needed during the build but not at runtime) should be passed using Docker BuildKit's --secret flag, which mounts the secret as a temporary file available only during that specific RUN instruction and never added to any image layer. Runtime secrets should be injected as environment variables by the orchestrator, mounted from a Kubernetes Secret, or fetched dynamically from a secrets manager like HashiCorp Vault or AWS Secrets Manager.

The ENV instruction is persistent. Its values appear in the image metadata and are visible to anyone who can inspect or pull the image. The ARG instruction is slightly better (values are not in the final image metadata by default) but still appears in the build history. Neither should be used for real credentials. The correct pattern is no secret in the image, full stop.

A common source of leakage is CI/CD pipelines where secrets are echoed in build logs, captured in artifact uploads, or exposed in error messages. Container build logs should be treated as potentially public and secret values should be masked by the CI system. After any suspected exposure, in a log, in an image layer, in a commit, the secret must be rotated immediately, because it should be assumed to have been seen.

Image scanning

Image scanners inspect the contents of a container image against databases of known vulnerabilities. Most scanners look at OS-level packages (dpkg, rpm) and language-level packages (npm, pip, maven) and match their versions against CVE databases like the National Vulnerability Database (NVD) and vendor-specific advisories. Some scanners also check for misconfiguration patterns, exposed secrets, and overly permissive file permissions.

A scan result is a signal, not a verdict. A finding that says 'critical CVE' does not automatically mean the application is exploitable. The CVE may be in a library component that is never called by the application, or the affected code path may not be reachable from user input. Reachability analysis, understanding whether the vulnerable code is actually executed in context, is increasingly available in newer scanning tools but still requires human judgment to act on correctly.

Base image age is one of the most important factors in scan results. An image built on Ubuntu 20.04 six months ago will have accumulated all the package vulnerabilities reported in that time, even if the application code has not changed. Regularly rebuilding images from updated base images (automated in CI) is one of the most effective ways to keep scan results manageable and reduce the real vulnerability exposure of running containers.

Scanning should happen at multiple points. During the build (to catch issues before they reach a registry), in the registry (to catch newly disclosed vulnerabilities in images that were clean when built), and optionally at deploy time (to enforce a policy before a container starts). The most common mistake is scanning only at build time and treating a clean result as permanent. CVEs are disclosed continuously and an image clean today may have known vulnerabilities tomorrow.

Image signing

Image signing uses cryptographic signatures to prove that a specific image was produced by a specific party and has not been modified since it was signed. Without signing, a deployment system pulling 'myapp:2.4.1' has no way to verify whether it is the image the build pipeline produced, or one that was substituted, intentionally or through a registry compromise, after the fact.

Cosign (from the Sigstore project) is the modern standard for container image signing. It stores signatures in the same OCI registry as the image and integrates with tools like Rekor for a transparent, append-only public log. The signing key can be a traditional key pair or a short-lived identity from an OIDC provider like GitHub Actions, which avoids the need to manage long-lived signing keys in CI pipelines.

The value of signing is realized at enforcement time. An admission controller in Kubernetes (such as Sigstore's policy-controller or Connaisseur) can verify that every image being deployed has a valid signature from the trusted pipeline before allowing the container to start. This prevents deploying unsigned images, images from untrusted registries, or images whose signatures do not match the expected key.

A common misunderstanding is conflating signing with scanning. Signing answers 'who built this image and is it unmodified?'. Scanning answers 'does this image contain known vulnerabilities?'. Both are needed, and neither replaces the other. A signed image can contain critical CVEs, a clean-scanning image may not have been signed by a trusted pipeline. Defense in depth means running both controls.

SBOM

A software bill of materials (SBOM) is a machine-readable inventory of every component in a software artifact. For a container image, this means all OS packages, language libraries, and their exact versions. SBOM formats include SPDX (a Linux Foundation standard) and CycloneDX (used widely in security tooling). Tools like Syft can generate an SBOM from any container image by inspecting its layers.

SBOMs enable faster incident response when a new vulnerability is disclosed. When Log4Shell was announced in December 2021, organizations with SBOMs for all their images could immediately query which containers contained log4j and what version, in minutes rather than days. Organizations without SBOMs had to manually inspect images or scan everything reactively, often under time pressure with incomplete results.

Beyond incident response, SBOMs support license compliance review (ensuring that all included packages have acceptable licenses), audit requests from customers or regulators, and supply chain transparency requirements like those in the US Executive Order 14028 on Improving the Nation's Cybersecurity, which requires SBOMs for software sold to the federal government.

An SBOM is only as useful as the process that keeps it current and accessible. An SBOM generated at build time and stored with the image in the registry can be retrieved by any authorized system. Organizations that generate SBOMs but store them somewhere disconnected from the images they describe, or that generate them only on request, lose most of the operational benefit. The SBOM should be a first-class artifact of the build pipeline, attached to every published image.

Common misconfigurations

Privileged containers (privileged: true in Kubernetes, --privileged in docker run) grant the container almost all host Linux capabilities and remove most kernel protections. They are rarely necessary, usually a specific capability or device access would suffice, but are often added as a shortcut to fix a permission problem quickly. A privileged container is effectively on the host, if compromised, the attacker has host-level access.

Broad hostPath mounts mount a directory from the host node directly into the container. Commonly misused patterns include mounting /var/run/docker.sock (giving the container full control of the container runtime on that node), /proc, /sys, or even / (the entire host filesystem). Any of these gives a compromised container a path to escape isolation and control the underlying host.

hostNetwork: true makes the container share the host's network namespace. The container can see all network traffic on the host, listen on host ports, and bypass network policies designed to isolate workloads from each other. This is sometimes used for performance or compatibility but is a significant security boundary reduction.

Floating image tags (using 'latest' or a moving version tag without digest pinning) mean that a deployment can silently pull a different image than intended. Combined with an insecure or unauthenticated registry, this creates a path for image substitution attacks. Pinning images by digest, not tag, eliminates this risk. Running containers from untrusted or unverified registries, without any signing or policy enforcement, leaves the full image supply chain as an open attack vector.