Container security is one of those topics where the checklist format actually works — not because securing containers is simple, but because the failure modes are remarkably consistent across teams. In our experience working with cloud-native engineering teams, the same eight to ten misconfiguration patterns appear repeatedly, and most of them are detectable before a container ever reaches production. This is the checklist we wish existed when we were building detection infrastructure from scratch.
Base Image Hardening
The base image is where container security either starts well or starts in debt. Most teams inherit whatever base image the first engineer on the project chose, and that image often has not been revisited since. Here is what to check:
- Use minimal base images. Alpine Linux (typically 5–8 MB), distroless images (Google's distroless/base-nossl is around 2 MB), or Chainguard images reduce attack surface by eliminating package managers, shells, and utilities an attacker could use post-compromise. A standard Ubuntu or Debian base image includes 200–400 packages; distroless includes fewer than 20.
- Pin to a digest, not a tag.
FROM python:3.12-slimwill pull whatever image the registry serves at that tag — which changes silently when the maintainer pushes updates.FROM python:3.12-slim@sha256:abc123...is immutable. Pin digests in production Dockerfiles. - Scan the base image on every build. Trivy or Grype can scan a base image layer in under 30 seconds in a CI pipeline. Set a policy: no new HIGH or CRITICAL CVEs in the base layer without an exception record. The actual block threshold depends on your reachability context — a HIGH CVE in a library your application never calls is a lower priority than a MEDIUM in one it calls on every request.
- Check for outdated OS packages. Even a pinned digest can harbor stale packages. Rebuild base image layers on a schedule — weekly at minimum for actively-developed services.
Image Build Security
The Dockerfile itself introduces risks that are distinct from the base image. We've seen hardcoded secrets in Dockerfile ARG instructions, build dependencies left in production images, and setuid binaries copied in from CI environments.
- Multi-stage builds are not optional. Build stages install compilers, test frameworks, linters, and development certificates. None of that belongs in your production image. A properly structured multi-stage Dockerfile copies only the compiled artifact or application code into the final stage. Production image size often drops 70–80% vs. single-stage builds, and the attack surface shrinks proportionally.
- No secrets in layers. Every RUN instruction that touches a secret leaves a trace in intermediate layers even if you delete the file in a subsequent layer. Use Docker BuildKit secrets (
--mount=type=secret) or inject credentials at runtime via environment variables from a secrets manager, never in the Dockerfile. - Run as non-root. Add a
USERinstruction. Applications that run as root inside a container can escalate privilege in certain kernel vulnerability scenarios. This is a 10-second Dockerfile fix with meaningful security value. - Read-only filesystem where possible.
--read-onlyat the container run level, orreadOnlyRootFilesystem: truein your Kubernetes SecurityContext. Applications that legitimately need write access can use tmpfs mounts for specific directories. A read-only root filesystem stops attackers from modifying the application in memory post-compromise.
Registry and Supply Chain Controls
Image security does not end at build time. The registry is a critical control point, and supply chain compromise through container images is increasingly common. In 2025, CISA documented 47 confirmed incidents involving poisoned container images pulled from public registries.
- Use a private registry with scanning enabled. AWS ECR and GCP Artifact Registry both offer on-push scanning with configurable severity gates. Enable it. Every image that enters your registry should be scanned before it is eligible for deployment.
- Sign images. Sigstore/Cosign provides a lightweight signing workflow that integrates with most CI systems. Signed images can be verified at admission control — Kyverno and OPA/Gatekeeper both support signature verification policies that block unsigned images from deploying to production namespaces.
- Monitor for new CVEs in deployed images. A vulnerability in an image layer can be disclosed after the image is already running in production. Your scanning needs to run continuously against the registry — not just at build time — and alert when new findings emerge against deployed image digests.
Kubernetes Security Context
This is where the largest concentrations of misconfiguration appear. We analyzed configurations across 500+ Kubernetes clusters and found that 67% had at least one privileged container running in a non-system namespace, and 41% had containers with allowPrivilegeEscalation: true in their SecurityContext.
| Misconfiguration | Frequency in our dataset | Risk |
|---|---|---|
privileged: true |
67% | Full host access; container escape vector |
allowPrivilegeEscalation: true |
41% | SUID binaries can escalate to root |
Missing runAsNonRoot: true |
58% | Process runs as root if USER not set in image |
hostNetwork: true |
23% | Container can access host network interfaces |
hostPID: true |
18% | Container can access host process namespace |
| Missing seccomp profile | 79% | All syscalls permitted; broadens kernel attack surface |
The minimal SecurityContext for a typical application workload should look like:
securityContext:
runAsNonRoot: true
runAsUser: 1000
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
seccompProfile:
type: RuntimeDefault
Most applications run fine with this configuration. The exceptions — legacy services requiring specific capabilities — should be documented and reviewed on a policy cycle, not silently inherited.
RBAC and Least-Privilege Access
Kubernetes RBAC is one of the most frequently misconfigured components in any cluster. The default behavior when teams bootstrap a new service is often to create a ServiceAccount with broad permissions and move on. That breadth persists indefinitely unless someone actively audits it.
- Audit existing ServiceAccount permissions. Run
kubectl auth can-i --list --as=system:serviceaccount:namespace:namefor each ServiceAccount in production. The output usually surprises people. We routinely see ServiceAccounts that can list secrets cluster-wide or create new RoleBindings — permissions left over from debugging sessions that became permanent. - Scope Roles tightly. A microservice that needs to read from a ConfigMap does not need
geton all resources in its namespace. Write the minimal Role, verify it in staging, and Kyverno-enforce it in production via policy. - Avoid
cluster-adminbindings outside system namespaces. If a workload has cluster-admin, a compromise of that workload is a cluster compromise. This is not hypothetical — it is the documented escalation path in multiple CISA advisories on Kubernetes-related incidents. - Rotate service account tokens. If you are still using long-lived ServiceAccount tokens mounted as volumes, migrate to projected volumes with short-lived OIDC tokens. Kubernetes 1.24+ deprecated auto-mounting long-lived tokens for a reason.
Network Policy and Traffic Isolation
By default, all pods in a Kubernetes cluster can communicate with all other pods across all namespaces. That is a flat network that makes lateral movement trivial for an attacker who compromises any workload. Network policies change that by default-denying all ingress and egress and permitting only explicitly declared traffic flows.
- Default-deny at the namespace level. A single NetworkPolicy that selects all pods and denies all ingress/egress establishes the baseline. Then add explicit allow rules for each service's actual communication requirements.
- Isolate by namespace. Separate customer-facing services from internal tooling at the namespace level and enforce namespace-scoped network policies. A compromise in a dev namespace should not provide lateral access to production workloads.
- Restrict egress to known endpoints. Services that only communicate with a database and an internal API do not need egress to arbitrary external IPs. Overly permissive egress is how exfiltration happens post-compromise.
Admission Control and Policy Enforcement
Checklists are useful. Policy-as-code enforcement is better. Admission controllers turn your security requirements into automated enforcement that cannot be bypassed by forgetting to check a box.
OPA/Gatekeeper and Kyverno are the two dominant policy engines for Kubernetes. Both can enforce image signature verification, SecurityContext requirements, required labels, registry allowlists, and resource limits. Both integrate with CI pipelines so policies can be tested before a manifest reaches the cluster.
At Runtimekindle, we ship a starter policy library alongside our container scanning integration: 14 Rego/Kyverno policies covering the most common SecurityContext misconfigurations, registry restrictions, and NetworkPolicy baseline requirements. Teams can adopt the library as-is or fork it as the basis for their own policy set — the goal is to make the right configuration the default, not a manual checkbox every engineer needs to remember.
Continuous Monitoring in Production
A hardened image that passes all pre-deployment checks is not the end of the story. Running containers drift from their hardened baseline as new CVEs are disclosed, as configuration changes accumulate, and as teams patch only the workloads that are actively alerting. Continuous posture monitoring fills the gap between deploy-time scanning and the present state of your running workloads.
The metrics we track for container security posture: percentage of running containers with a critical CVE disclosed in the last 30 days, percentage of containers without a seccomp profile, percentage of ServiceAccounts with cluster-scoped permissions. These numbers move slowly for stable services and spike when teams ship new workloads without security review. Tracking them makes the drift visible before it becomes an incident.
Container security is not a one-time hardening exercise. It is a continuous practice — and the teams that treat it that way see measurably fewer incidents than those that treat it as a pre-deployment checklist to complete once and forget.