© Jean-Claude Moschetti / CNRS Photothèque

On Mechanized Semantics for Verified Compilation

by Eduard Kamburjan — 19 January 2024
Topics: interview

Sandrine Blazy is professor in the computer science department of the University of Rennes and member of Epicure, a joint project-team with Inria and IRISA. Her research activities concern the formal verification using the Coq proof assistant of program transformations and semantic properties of programming languages, such as those found in the CompCert C compiler. A prime application domain is software security. She is awardee of the CNRS Silver Medal, the FME Lucas Award, the ACM SIGPLAN Programming Languages Software Award and the 2021 ACM Software System Award. She is an invited speaker at ETAPS 2024.

You are, among other things, well known for your work on verified compilers and especially CompCert. Can you introduce this topic in a few sentences?

Maybe just to start, I will explain some things in CompCert that I worked on, and of course there are other challenges. CompCert led by Xavier Leroy started about 19 years ago as a research project; the goal was to build and verify a C compiler. In order to get very strong guarantees on the absence of errors in such software, we chose to use deductive verification, where you first write a specification, and then prove that your software satisfies its specification. For a compiler, the specification is a semantic preservation property: the generated code must behave as prescribed by the semantics of the source program. So this specification is about the semantics of the source and target languages of the compiler.

A first challenge was to define a realistic semantics for C and to mechanically reason on this semantics with the Coq proof assistant. Moreover, the semantics is really part of the specification, and we need ways to validate the semantics because what we did was the first fully mechanized formal C semantics. A first solution to validate the semantics was to mechanize operational semantics using different styles, and to prove their equivalence using simulation-based proofs. Another solution was to test our specification, namely to derive an interpreter from the semantics in order to test the semantics on a representative benchmark, but also on unusual C programs generated by a fuzzer for C compilers. End users can also run programs through the executable C semantics to check that these programs are free of undefined behaviors.

What do you think were the biggest insights?

A verified compiler consists of the compiler itself, together with a proof that the compiler does not introduce bugs during compilation. Developing a verified compiler requires both programming the software using the programming language of the proof assistant so that it runs efficiently on real programs, and defining a semantic model and abstractions to reason about, in order to conduct the proof. This software/proof co-design is a true scientific challenge. Indeed, although these two objectives are closely related (as the proof directly uses the software code, in addition to the semantic notions on which the reasoning is done), they are also sometimes antagonistic (e.g., the proof needs to compute additional data structures facilitating the reasoning but slowing down the execution of the compiled programs, or the software uses less efficient functional data structures).

The semantic model we defined for CompCert relies on small-step semantics based on continuations, a new operational style that facilitates semantic reasoning. Indeed, thanks to continuations, we could reason on terminating and diverging program behaviors using a general proof technique based on simulations (where an execution step of the source program is simulated by zero, one or several steps of the execution of the target program). One of the main abstractions is the memory model (where all the values that are computed during the execution of your program are stored) that is defined at different levels of abstraction and shared by the 10 languages of the compiler, which facilitates the proofs as well. Interestingly, having several intermediate languages is surprising for compiler writers, but this a strength for a verified compiler, as it allows proofs to be decomposed and more maintainable, as in a refinement approach.

A last lesson learned is related to the proof methodology we chose to follow. Initially, we were only conducting direct proofs, meaning that you state a theorem and do the proof once for all for any program to compile. But sometimes such proofs become tricky, and there is another way to have proofs: verified a posteriori validation. Instead of proving your property once for all, you prove for each program you actually compile. This is interesting when it makes the property you prove much simpler, because it is only related to one, known program. But the price to pay is that you have to prove this for each program you want to compile, thus increasing the compilation time.

What were the challenges on a more practical level?

Well, we are doing proofs, but we also build software. In Coq, if your program is written using a functional style, you can extract it in OCaml code, so that you can run your software outside your proof assistant. But then, when we started this, at the very, very beginning, this extraction facility was brand new and it was not usable on our compiler at all. Moreover, because our semantics define real languages using predicates, we had to define some induction principles (on which the reasoning is based) by hand instead of using those generated by Coq, which is error-prone. Last, at this time, some Coq libraries for efficient data structures were also missing and then developed by other people.

What do you think are the next steps?

CompCert was designed initially to compile critical embedded software. It was used in industry to compile avionics and nuclear plant software. In these fields, the main motivation for using CompCert was to increase the performance of the generated code, while ensuring all the traceability guarantees required by certification authorities, which CompCert provided. There are many optimizations that could be added to CompCert to increase performance. Loop optimizations are particularly challenging to prove correct. Another direction for future work is shared-memory concurrency in a weakly consistent memory model, as standardized in C 2011. Operational semantics for concurrent C 2011 programs are extremely complex; using them to prove the correctness of a program transformation is an open question.

CompCert is a milestone and an open infrastructure for research. Several works reuse CompCert in order to extend it to other languages, either by developing a new front-end or a new back-end. For instance, the Velus and L2C compilers front-ends for synchronous languages target the Clight language of CompCert. Another example is the Kalray KVX back-end. Other projects such as VST reuse the Clight semantics to define a program logic for proving C programs, which ensures thanks to CompCert that the property established at source level still holds at target level.

In Rennes, in collaboration with colleagues from Imperial College London, we are working on a new target, hardware design for reconfigurable architectures, and a new domain, high-level synthesis (HLS), where the compiler produces a custom hardware design rather than traditional assembly code. HLS allows developers to access the computational power and energy efficiency of custom hardware devices such as FPGAs, while allowing them to take advantage of the practical abstractions of software development. Focusing on HLS will require the design of new semantic models (close to the data-flow style of hardware design that modern HLS tools commonly adopt, for instance, to parallelise code) and proof techniques.

Where do you see the current challenges for mechanizing program semantics?

When we add a compiler pass or change the compilation target, we need to invent new invariants. This may require instrumenting the semantics so that they observe more events, or designing new intermediate languages and their well-founded properties. So, we need better tooling for defining mechanized semantics and reasoning on these semantics. Of course, we can reuse existing ones, but we would like to have more meta-level general tooling. Some people are working on automatically deriving an interpreter or other operational semantics from a semantic definition. But more work is needed to apply these techniques to realistic semantics and to their correctness proofs. Another issue is better proof automation to facilitate the reasoning and more generally the proof engineering.

You mean better reasoning techniques?

The correctness proof of CompCert relies on simulation proofs, namely an elementary and powerful proof technique for operational semantics. In recent works with colleagues, we extended CompCert so that it ensures a security property useful for cryptographic implementations, while still having the compiler correctness. In a more recent work, my former PhD student Aurèle Barrière verified a prototype of a JIT compiler that reuses the CompCert back-end, including its optimizations (also here). In both cases, we faced a proof engineering issue and managed to reuse the existing simulation proofs. These new properties are more challenging to prove as they involve different executions or different versions of a program. We were able to generalize the simulation technique to still apply it and thus reuse existing correctness proofs as much as possible. It would be interesting to further study these extended simulations. Furthermore, maybe there could be more abstract reasoning techniques, something other than simulation. For instance, logical relations, where you embed properties in types is an active field of research that seems promising.

Verified compilers are out of reach from automated provers. Indeed, in order to ensure the required invariants, the proof may need to compute additional data structures. However, better automation from proof assistants would be helpful. For instance, proof transfer where you change the representation of an abstraction without having to prove again the lemmas you already proved previously is another active area of research from which verified compilation would benefit.