What We Learned Building AppSec Infrastructure from Seed to Series A

We founded Runtimekindle in 2023 with a clear thesis: the root cause of alert fatigue in application security is not that SAST tools are wrong — it's that they have no visibility into whether a finding is reachable at runtime. Eighteen months later, we have a working product, paying customers across the US, and a set of architectural scars that would have saved us about four months of work if we'd known about them at the start. This is that story, as honestly as I can tell it.

The First Version Was a Filter, Not an Analysis

We started with the simplest version of the reachability idea: build a call-graph from static analysis, cross-reference SAST findings against it, suppress the ones that aren't on any call path. It worked. Alert volume dropped significantly in our early testing. But security leads at the teams we showed it to asked the same question immediately: "Why is this finding suppressed? How do I know this is right?"

We didn't have a good answer. The suppression was correct — the finding was genuinely unreachable — but we couldn't show the evidence. The call-graph was an internal data structure we used for filtering, not something we exposed. And without the explanation, the suppression felt arbitrary. A security lead who can't explain why a finding was downranked can't defend that decision to an auditor or a CISO. So they'd rather have the finding visible than have it suppressed by a black box.

We rebuilt. Instead of treating the call-graph as an implementation detail, we made it the primary artifact. Every suppression decision now links to the specific call-graph state that justified it — the execution paths that were analyzed, the code paths that weren't traversed, the reason a finding didn't appear on any live path. This added about three months to our initial timeline. It was the right call. Every customer conversation since has included someone asking "can I see why this was suppressed?" and being able to answer that question is table stakes in AppSec tooling.

The lesson: in security products, explainability is not a feature you add later. It's part of the core value proposition from day one. Buyers need to understand and defend every decision your tool makes.

Static Call-Graphs Break on Dynamic Dispatch

Our early architecture built call-graphs statically — analyzing source code to map which functions called which other functions. This works well for simple, direct call chains. It breaks in several common patterns that are pervasive in modern web applications: dynamic dispatch (calling a function through a variable), polymorphism (calling a method on an interface without knowing the concrete type), event-driven execution (handlers registered at runtime, not statically analyzable), and reflection.

We discovered this failure mode the hard way. A team was using Runtimekindle to evaluate a Node.js microservice. The service used Express.js with dynamically-registered route handlers. Our static call-graph had almost no coverage of the actual execution paths — the entry points were all dynamic registrations that static analysis couldn't follow. We were suppressing findings based on a call-graph that described a third of the real execution surface.

The fix required adding runtime instrumentation to complement static analysis. Instead of building the call-graph solely from source, we instrument the running application container to observe actual function calls during execution. The result is a hybrid call-graph: the static analysis gives us coverage for code paths that aren't exercised in typical runtime windows; the live instrumentation gives us ground truth for the dynamic paths that static analysis misses.

The hybrid approach is more complex to implement and maintain. It requires an instrumentation agent that can attach to the application process without disrupting execution — we use eBPF-based observation where the runtime supports it, and language-specific agents for runtimes like JVM and V8 where eBPF has limited visibility into the interpreter. But it's the only approach that gives correct results across the language diversity we see in production deployments.

We've now instrumented across 12 language runtimes. The architecture for each is somewhat different. Go's goroutine model requires different tracing strategies than Java's thread model. Python's GIL creates timing constraints that affect how you capture concurrent execution paths. Every runtime has its own instrumentation quirks, and we've hit most of them.

Triage Quality Is the Product

Six months into development, we were focused entirely on detection accuracy — getting the call-graph right, minimizing false positive suppressions, maximizing true positive catch rate. Then we started doing more customer interviews, and a pattern emerged that we hadn't expected: detection accuracy wasn't the main friction point. Triage speed was.

Engineers who received a Runtimekindle finding — even a well-prioritized, runtime-reachable one — were still spending significant time understanding what it meant and figuring out what to do about it. CVE descriptions are written for security researchers, not for developers who need to ship a fix before the sprint ends. A finding that says "CWE-89: Improper Neutralization of Special Elements used in an SQL Command" is not actionable without additional context that the developer typically has to assemble themselves.

We added LLM-based triage summaries. For each finding, the system generates three sections: what vulnerability is exposed and why it's a problem in plain terms, how an attacker could exploit this specific instance (not the generic CVE description, but the specific code path in the customer's application), and the minimal code change to fix it. Mean time to first action on a finding dropped from an average of 4.2 hours to 47 minutes across our early customer cohort. That's a larger impact on developer behavior than the reachability filtering itself, and it was not what we planned to build when we started.

The architecture for generating useful triage summaries is more involved than it looks. Generic CVE summaries are easy. Summaries that reference the specific function, class, and call path in the customer's codebase require combining the CVE metadata, the call-graph context, and the customer's source code in the prompt context. Getting that context window right — enough detail to be useful, not so much that the model hallucinates specifics — took about six weeks of iteration. We're still improving it.

The Operational Overhead We Underestimated

We spent the first twelve months optimizing for detection quality. We spent the next six months dealing with operational realities we'd underprioritized. Three that cost us the most time:

Instrumentation agent versioning. We ship an instrumentation agent as a sidecar container that attaches to customer workloads. When we update the agent, customers need to update their deployments. In practice, customers update on their own timeline — which means we're supporting multiple agent versions simultaneously. We didn't design for this early enough. Our first major agent refactor required a version negotiation protocol we hadn't planned for, and we spent three weeks building backward compatibility we should have architected from the start. Build your agent versioning and deprecation policy before you have ten customers on different versions.

Multi-language call-graph merging. Modern microservice architectures often involve multiple language runtimes. A Node.js frontend calls a Go backend which calls a Python ML service. A vulnerability in the Python service is only exploitable if the call chain from the external-facing Node service can reach it. Our initial architecture treated each service's call-graph independently. We had to add a cross-service call-graph merging layer to correctly assess reachability across language boundaries. That's a non-trivial distributed systems problem and we discovered it six months later than we should have.

False positive handling at scale. At small scale, a few false positives are manageable — you handle them in support tickets. At larger scale, the same false positive pattern appears across many customers simultaneously (a popular library generates a consistent type of false positive), and you need a systematic way to deploy suppression rules across all affected customers without requiring each to configure it individually. We built a shared allowlist infrastructure that lets us deploy pattern-based suppression to all connected accounts in a single update. We should have built it earlier.

What We Got Right

Not everything required a rebuild. Three architectural decisions that have held up well:

We built on a finding-based data model, not a scan-based one. Every finding has a persistent ID that tracks through its entire lifecycle: first seen, reachability score changes over time, triage notes, assigned owner, remediation state. This makes trend analysis possible — you can ask "which service has the longest-lived open findings?" or "how has this team's mean time to remediate changed over the last six months?" A scan-based model, where each scan is an independent snapshot, makes longitudinal analysis much harder. Get your data model right early.

We integrated with CI/CD from the beginning, not as an afterthought. Runtimekindle findings appear as checks on pull requests, not as notifications in a separate security tool. This was a deliberate decision to keep security feedback in the workflow developers already use. Tools that require developers to leave their primary workflow to check a security dashboard see significantly lower engagement than tools that surface findings inline. We had data on this from our time at previous companies, and it has held up.

We made deployment non-disruptive by design. The instrumentation agent attaches to containers as a sidecar — no modification to the application code required, no recompilation, no restart. Teams that have to modify their application to deploy a security tool face weeks of internal approval processes. Teams that can deploy a sidecar container deploy on the same day they decide to try the product. Time-to-first-value for security tooling matters more than most security vendors acknowledge.

What I Would Tell Myself at the Start

A few things stand out, looking back. Explainability and traceability are non-negotiable in security products — plan for them from day one, not as features you'll add when customers ask. Static analysis is necessary but not sufficient for call-graph accuracy; you need runtime instrumentation for the dynamic paths that dominate real application behavior. And triage quality — the quality of the human-readable explanation attached to each finding — has more impact on adoption than detection precision once you're above a basic accuracy threshold. Detection gets you in the door. Triage keeps engineers using the tool.

We're still building. The hybrid static-plus-runtime architecture has more depth to explore, particularly for distributed systems where reachability spans service boundaries. But the decisions we made in the first eighteen months — both the ones we got right and the ones we had to redo — shaped the product in ways that are now hard to separate from the architecture itself. That's how it goes when you build something genuinely new.

The First Version Was a Filter, Not an Analysis

Static Call-Graphs Break on Dynamic Dispatch

Triage Quality Is the Product

The Operational Overhead We Underestimated

What We Got Right

What I Would Tell Myself at the Start

Related articles

Why 80% of SAST Alerts Are Noise — And How to Fix It

How We Use LLMs to Summarize Security Alerts in Plain English

The Developer Experience Problem in Application Security