Container Security Oct 27, 2025 10 min read

Container Image Vulnerability Triage: Why Layer Attribution Changes Everything

Container image vulnerability triage with Dockerfile layer attribution and runtime reachability

Container image scanning is now standard practice. Most teams running K8s have it wired into CI/CD in some form — a Trivy scan, a registry integration, an inline check in the GitHub Actions workflow. The problem isn't running the scanner. The problem is what happens after it outputs 200 findings and the team has two sprint-hours budgeted for security work.

Without a structured triage model, the findings pile up. Engineers learn to scroll past the scanner output. The backlog grows until a security audit asks about it. This isn't a culture problem — it's a signal-to-noise problem. A container image built on a standard debian base image will often carry 60-100 OS-level CVEs from packages installed at base image creation time, most of which have no bearing on your application's actual attack surface. Treating that list as uniformly actionable is how you burn out your engineering team on false urgency.

Layer Attribution: The First Filter That Actually Works

Dockerfile layer analysis is the most underused capability in container image scanning. Every CVE finding has a source layer — either the base image, an intermediate build step, or your application layer. That layer attribution completely changes the remediation path.

A CVE in the base image layer (the FROM debian:bookworm-slim line) can only be fixed by updating the base image tag or switching to a different base. Your application code didn't introduce it, and changing your application code won't fix it. The correct remediation is a base image rebuild — and if you're pinning to a digest rather than a tag, you need to update that digest and re-test. This is a DevOps task, not an application engineering task.

A CVE in an intermediate build layer — a RUN apt-get install curl step, for example — is fixed by updating that specific package or removing the install step if the package isn't needed at runtime. This is a Dockerfile maintenance task.

A CVE in your application layer — in a dependency you explicitly declared in package.json, go.mod, or requirements.txt — is a dependency upgrade task for the application team. This is where engineering attention should go first, because this is the layer you fully control and where the fix is within a sprint's reach.

Triaging by layer first gives you three buckets with different owners and different urgency levels, rather than one undifferentiated list that no one knows how to prioritize.

CVSS Is Not a Triage Strategy

The instinct to sort findings by CVSS score and work from the top of the list is understandable but incomplete. CVSS base score measures the theoretical worst-case impact and exploitability in a vacuum — it doesn't account for your specific deployment context. A CVSS 9.8 network-reachable buffer overflow in a package that your container image includes but your application code never loads at runtime is less urgent than a CVSS 7.2 vulnerability in a library that handles authenticated user requests on every API call.

EPSS (Exploit Prediction Scoring System) adds a useful dimension: the probability that a specific CVE will be exploited in the wild in the next 30 days, based on observed exploitation patterns. A CVSS 9.8 with EPSS 0.03 (3% exploitation probability) is materially different from a CVSS 7.5 with EPSS 0.54. EPSS scores are updated daily by FIRST and are freely available via the FIRST API or bundled into Grype's vulnerability database. Any scanner that doesn't surface EPSS scores alongside CVSS is giving you an incomplete triage picture.

We're not saying CVSS scores are useless. They remain the right signal for impact severity — a CVSS 9.8 that does get exploited will hurt more than a CVSS 5.4. The point is that CVSS alone, without EPSS and without runtime reachability context, is an unreliable prioritization signal for a team with finite patch capacity.

Registry Scanning vs CI Pipeline Scanning: When Each Matters

Registry scanning and CI pipeline scanning are complementary, not interchangeable. The failure mode of relying exclusively on CI-time scanning is that vulnerabilities disclosed after the image is built don't trigger a re-scan — your image sits in ECR or GCR accumulating newly-disclosed CVEs that no one sees until the next rebuild. A service running an image built six months ago may have a dozen new critical findings that exist only in the registry record, not in any CI pipeline output.

Registry scanning solves this by continuously re-evaluating stored images against the current vulnerability database, independent of CI activity. ECR's enhanced scanning (powered by Inspector) and GCR's Container Analysis both provide this capability. The integration pattern for K8s environments is: registry scanner detects a new critical CVE in a deployed image → fires a webhook to your SIEM or alerting system → creates a ticket in your vulnerability tracking system with the affected service and deployment namespace.

CI pipeline scanning solves a different problem: it prevents new vulnerabilities from entering the registry in the first place. A pipeline gate that blocks merge when a critical CVE is introduced by a new dependency or a base image downgrade stops the problem at the source, before it's deployed anywhere. The operational requirement is keeping the scan fast enough that it doesn't add meaningful latency to the pipeline — Trivy scanning a typical Go microservice image runs in 8-12 seconds against a warm database cache, which is acceptable.

The right architecture is both: pipeline gating for new vulnerabilities at build time, registry scanning for vulnerability disclosures that affect already-deployed images. Teams that only run CI scanning are flying blind on their deployed fleet.

The Noise Problem and What Actually Generates It

Sixty to seventy percent of the findings in a typical debian-based container image scan come from OS packages that are present in the base image but have no relationship to your application's functionality. The perl interpreter, util-linux utilities, gcc build tools that made it into a non-distroless runtime image — these generate findings that you can't patch without rebuilding the base, and many of them represent vulnerabilities in code your application never calls.

Distroless images are the structural fix. Google's gcr.io/distroless/base images ship with only the runtime dependencies the application needs — no package manager, no shell, no debug utilities. A distroless Go image might have 3-5 OS-level findings versus 60-80 for the equivalent debian slim image. The tradeoff is operational: distroless images are harder to debug (no shell to exec into), which requires moving debug tooling to sidecar containers or to cluster-level debug pods rather than the application container itself.

For teams not ready to move to distroless, the intermediate approach is aggressive base image slimming: start from alpine:3.x (typically 2-5 OS CVEs versus 60+ for debian), audit your RUN install steps to remove packages that aren't needed at runtime, and pin to a specific alpine digest rather than a floating tag so your base image doesn't silently change between builds.

A Triage Workflow That Engineering Teams Can Actually Follow

A fintech platform team operating 55 microservices across EKS in two AWS regions implemented the following triage framework after their scanner output became too large to manually review each sprint. Their scanner output averaged 280 findings per weekly scan across all services.

Step 1: Strip base image findings that have no upstream fix available. CVEs in OS packages with "will not fix" status from the distro maintainer are not actionable on your sprint timeline — they belong in a deferred list reviewed quarterly when evaluating base image alternatives. This eliminated 90 of 280 findings in their case.

Step 2: Filter to EPSS ≥ 0.10 within the remaining set. Below 10% exploitation probability, vulnerabilities require special justification to prioritize over feature work. This brought the list to 65 findings.

Step 3: Apply runtime reachability data. Cross-reference the 65 findings against eBPF load observations. Libraries with zero load events over the preceding 30 days move to a "acknowledged, not loaded" queue. This reduced the active triage set to 18 findings.

Step 4: Assign by layer owner. Of the 18 findings: 7 are in application dependencies (engineering team, priority 1), 8 are base image CVEs with upstream fixes available (DevOps team, pin update, priority 2), 3 are in intermediate Dockerfile install steps (platform team, Dockerfile refactor, priority 3).

The result: 280 scanner findings became 18 triage items with clear owners and priority ordering in under 20 minutes of triage time, compared to 3+ hours previously. The 262 deferred or acknowledged findings are tracked, not ignored — they have VEX-style justifications attached and are re-evaluated when their status changes.

What to Do When You Have More Criticals Than Sprint Capacity

The uncomfortable truth about container image scanning in production is that most teams will have more critical findings than they can remediate in any given sprint. This is not a failure state — it's the normal operating condition for any team running complex software on a shared base image ecosystem. The question isn't "how do we get to zero criticals" but "which criticals actually represent exploitable risk in our environment, and are we tracking the rest honestly."

The answer is a combination of the triage framework above plus formal risk acceptance for findings you've decided to defer. Risk acceptance should be explicit: it names the finding, the justification for deferral (runtime not loaded, no upstream fix, EPSS below threshold), the owner, and a review date. It should live in your vulnerability tracking system, not in an engineer's head or a comment in a Slack thread. When your SOC 2 auditor or enterprise customer security review asks how you handle known vulnerabilities in your container images, a documented risk acceptance process with triage justifications is the answer they're looking for — not a blank registry or a promise that you fix everything immediately.

Container Image Vulnerability Triage: Why Layer Attribution Changes Everything

Layer Attribution: The First Filter That Actually Works

CVSS Is Not a Triage Strategy

Registry Scanning vs CI Pipeline Scanning: When Each Matters

The Noise Problem and What Actually Generates It

A Triage Workflow That Engineering Teams Can Actually Follow

What to Do When You Have More Criticals Than Sprint Capacity

Related posts

Runtime-SCA Correlation: How eBPF Turns a 400-CVE Backlog Into 12 Actionable Items

K8s Admission Policies for Supply Chain Enforcement: OPA vs Kyverno vs Native

eBPF for Cloud-Native Security: How Kernel Probes Replace HIDS in K8s Environments