Verifying Peephole Rewriting In SSA Compiler IRs

There is an increasing need for domain-specific reasoning in modern compilers. This has fueled the use of tailored intermediate representations (IRs) based on static single assignment (SSA), like in the MLIR compiler framework. Interactive theorem provers (ITPs) provide strong guarantees for the end-to-end verification of compilers (e.g., CompCert). However, modern compilers and their IRs evolve at a rate that makes proof engineering alongside them prohibitively expensive. Nevertheless, well-scoped push-button automated verification tools such as the Alive peephole verifier for LLVM-IR gained recognition in domains where SMT solvers offer efficient (semi) decision procedures. In this paper, we aim to combine the convenience of automation with the versatility of ITPs for verifying peephole rewrites across domain-specific IRs. We formalize a core calculus for SSA-based IRs that is generic over the IR and covers so-called regions (nested scoping used by many domain-specific IRs in the MLIR ecosystem). Our mechanization in the Lean proof assistant provides a user-friendly frontend for translating MLIR syntax into our calculus. We provide scaffolding for defining and verifying peephole rewrites, offering tactics to eliminate the abstraction overhead of our SSA calculus. We prove correctness theorems about peephole rewriting, as well as two classical program transformations. To evaluate our framework, we consider three use cases from the MLIR ecosystem that cover different levels of abstractions: (1) bitvector rewrites from LLVM, (2) structured control flow, and (3) fully homomorphic encryption. We envision that our mechanization provides a foundation for formally verified rewrites on new domain-specific IRs.

翻译：现代编译器对领域特定推理的需求日益增长，这推动了基于静态单赋值（SSA）的定制化中间表示（IR）的使用，例如在MLIR编译器框架中。交互式定理证明器（ITP）为编译器的端到端验证提供了强有力的保证（例如CompCert）。然而，现代编译器及其IR的演进速度使得伴随其进行证明工程的成本变得极其高昂。尽管如此，在SMT求解器能提供高效（半）判定过程的领域中，诸如用于LLVM-IR的Alive窥孔验证器等范围明确的一键式自动化验证工具已获得认可。本文旨在将自动化验证的便利性与ITP的通用性相结合，以验证跨领域特定IR的窥孔重写规则。我们形式化了一个基于SSA的IR核心演算，该演算对IR具有通用性，并覆盖了所谓的区域（即MLIR生态系统中许多领域特定IR使用的嵌套作用域结构）。我们在Lean证明助手中实现的机械化框架提供了一个用户友好的前端，可将MLIR语法转换到我们的演算中。我们提供了用于定义和验证窥孔重写的脚手架，并提供了消除SSA演算抽象开销的策略。我们证明了关于窥孔重写以及两个经典程序变换的正确性定理。为评估我们的框架，我们考虑了MLIR生态系统中的三个用例，涵盖不同抽象层次：（1）来自LLVM的位向量重写，（2）结构化控制流，以及（3）全同态加密。我们设想该机械化框架能为新的领域特定IR上形式化验证的重写提供基础。