by Sebastian Junges — 20 November 2025
Topics: Interview, Awards
On the occasion of receiving the Rance Cleaveland Test of Time Tool Award at ETAPS'25 for the CBMC verification tool, we had the opportunity to interview Daniel Kroening, Peter Schrammel and Michael Tautschnig.
Let’s go back in time a bit. The seminal work on SAT-based Bounded Model Checking was developed in the late 1990s and was initially focussed on models and hardware verification. Even the initial publication underlying CBMC was an ASP-DAC 2003 paper and the TACAS 2004 tool paper that introduced CBMC still mentioned applications to the verification of hardware prominently. Can you explain the main obstacles that needed to be overcome to make an effective shift to software verification?
Daniel (DK): That initial paper established a crucial bridge: using an ANSI-C program as a reference specification for verifying a hardware design, say a circuit written in Verilog. Even today, the CPROVER ecosystem includes EBMC for handling Verilog and SystemVerilog, continuing that lineage.
However, the leap to verifying general-purpose software was fundamentally different because of the scale of the encoding. When verifying hardware, you encode the logic of a component, like an adder circuit in an ALU, once. When you verify a software program at bit-level, you must encode the equivalent logic for every instance of every operation in the execution trace. If you have an instruction like x = y + z; inside a loop unrolled k times, you need to generate the Boolean circuit for the 32-bit adder k times over.
Michael (MT): This compounding duplication made the problem initially seem intractable. The size and complexity of the resulting SAT formulas for software was orders of magnitude bigger than what hardware verification tackled. Back then, when hardware verification was already challenging, it was not imaginable that SAT solvers could handle formulas representing the execution of real-world C code.
Peter (PS): The success of CBMC, therefore, is a story of technical convergence. Unlike other approaches that used integer and real arithmetic to analyze software, CBMC chose to remain committed to exact bit-level C semantics, even if it meant generating these massive formulas. The reason that this “crazy idea” worked was the breakthrough in modern SAT solver technology in the early 2000s, which became the essential engine enabling CBMC to solve these humongous problem instances.
Today, we have robust SMT solvers, but when CBMC began, this technology was nascent. How was CBMC’s logic translation actually implemented?
MT: The process begins with the C code being translated into our GOTO intermediate representation (GOTO IR), which normalizes control flow. This prepares the program for bounded symbolic execution and conversion into Single Static Assignment (SSA) form, creating a purely functional data-flow graph up to the bound k. The core innovation was the logical encoding itself. Back then, off-the-shelf SMT (Satisfiability Modulo Theories) solvers didn’t exist yet. CBMC’s backend essentially had to become an SMT solver itself. It implements theories to handle bit-vectors, floating point arithmetic, arrays, and pointer arithmetic, and reduces that complex, word-level formula to a Boolean level for the underlying SAT solver. This pioneering work laid the groundwork for modern SMT solvers.
I learned that CBMC’s multi-path approach, encoding all control flow paths into a single, monolithic SAT formula, gives it great power. Why did you choose this approach over alternatives, like pathwise symbolic execution?
DK: Pathwise symbolic execution is memory-efficient but suffers from path explosion from enumerating execution paths. CBMC’s multi-path approach encodes all possible control flow paths up to the bound simultaneously. This strategy, where the computationally complex problems are entirely shifted to the SAT solver to resolve, gives CBMC a much higher likelihood of providing a proof. The trade-off, however, is a larger memory consumption. For industry, this focus on completeness is vital. Finding a bug is nice, but the real value is in providing proof that bugs are absent. This dramatically increases confidence far beyond what testing can achieve.
PS: Exactly. We’ve seen the need for proofs drive several key developments in the CPROVER ecosystem. BMC gives you a proof up to k, but that’s not the end of the story because it also gives you the basic building block to enable unbounded proofs of correctness. This is done using inductive proofs, which require invariants, though. These can be provided using contract annotations in the code or automatically generated using invariant inference techniques. The work on the 2LS tool has explored various techniques such as k-induction and template-based synthesis. We’re continually integrating these methods to transition from being merely a powerful bug finder to a system that generates full, industrial-grade guarantees.
Michael, can you tell us a bit how you operationalize these formal guarantees in an industrial setting like AWS and integrate them into the software engineering process?
MT: The greatest technical impact is achieved when formal verification is seamlessly integrated into the software lifecycle, running automatically on every code change. At AWS, we have pioneered the use of unit proofs—a model inspired by unit testing, but fundamentally different in scope and outcome.
A traditional unit test checks a function with one specific set of inputs and asserts a result. A unit proof, however, uses the power of CBMC to verify the code segment against all possible inputs and executions up to the loop bound. It’s a complete verification of the unit’s behavior under every bounded scenario. This process mathematically proves properties like memory safety (no buffer overflows or null pointer dereferences) and functional correctness for that module.
These proofs are checked into the repository alongside the source code. They are then executed as part of the Continuous Integration/Continuous Deployment (CI/CD) system. If a developer introduces a change, the unit proofs run, and if the verification passes, we know that the code change hasn’t introduced any violation of a critical property. If it fails, CBMC provides a precise, executable counterexample. This gives developers immediate, high-assurance feedback, fundamentally changing how confidence is maintained as a code base evolves. It’s the transition from testing for failure to proving correctness by construction within the engineering workflow.
Daniel and Peter, you co-founded Diffblue to leverage this technology. Can you speak to the expansion into Java with JBMC and how that evolved into a strategy of integrating multiple techniques?
DK: Expanding to Java with JBMC was a clear commercial opportunity, targeting regression unit test generation for massive legacy enterprise systems. While the GOTO IR enabled us to integrate Java into CPROVER, the sheer scale of the Java ecosystem is challenging—the complexity of dependencies, string manipulations, and the vast scale of the data structures involved, reflection and runtime code modifications made it significantly harder to perform BMC on Java than on C.
PS: Our experience with JBMC showed that while symbolic execution provides unparalleled, semantic precision, generating exhaustive test suites for Java applications sometimes requires a more pragmatic tool. This realization led us to augment our toolbox by fuzzing—a technique better suited for exploring the state space of large, complex applications—and more recently, the integration of Large Language Models (LLMs) for test generation. The main lesson for the field is that no single tool is a silver bullet. You always need to use the best tool for the job to deliver the best possible value to the user.
Finally, let’s look forward. The rise of LLMs has brought about a paradigm shift. Where does the foundational rigor of CBMC fit into this new world?
DK: LLMs are fantastic at making educated guesses. We have built verification tools based on the guess-and-check principles for many years—this is not new. With LLMs, however, we’ve got extremely powerful guessing engines. These make any heuristics hard-coded by humans entirely obsolete because LLMs produce the best probabilistic guess possible. For critical software, the defining metric is trust, which mandates minimizing the False Positive (FP) rate. A high rate of false alarms destroys a development team’s confidence in a tool. CBMC, as a formal verification tool, provides the crucial piece to achieve zero false positives. This is the non-negotiable value proposition of formal methods in the quest for trust.
PS: We see LLMs as a force multiplier for verification. They can generate vast amounts of code and tests quickly. However, deploying unverified, AI-generated code is a huge risk. Whether an LLM generated the code or a human wrote it, for critical systems, the final assurance must come from a tool like CBMC.
MT: Formal verification tools like CBMC and Kani are uniquely positioned to serve as the AI-QA layer—an automated, mathematically sound checker that runs over LLM-generated code to guarantee memory safety and compliance with critical specifications. The future of secure software development is the powerful synergy of generative speed and formal proof.
It’s great to hear that the foundational rigor of model checking tools like CBMC is more relevant than ever. Daniel, Peter, Michael, thank you for sharing this deep insight into the enduring technical legacy and future of CBMC.