QEDCartographer: Automating Formal Verification Using Reward-Free Reinforcement Learning

from arxiv, Published in the International Conference on Software Engineering (ICSE) 2025: Alex Sanchez-Stern, Abhishek Varghese, Zhanna Kaufman, Dylan Zhang, Talia Ringer, and Yuriy Brun, QEDCartographer: Automating Formal Verification Using Reward-Free Reinforcement Learning, in Proceedings of the 47th International Conference on Software Engineering (ICSE), 2025. doi: 10.1109/ICSE55347.2025.00033

Formal verification is a promising method for producing reliable software, but the difficulty of manually writing verification proofs severely limits its utility in practice. Recent methods have automated some proof synthesis by guiding a search through the proof space using a theorem prover. Unfortunately, the theorem prover provides only the crudest estimate of progress, resulting in effectively undirected search. To address this problem, we create QEDCartographer, an automated proof-synthesis tool that combines supervised and reinforcement learning to more effectively explore the proof space. QEDCartographer incorporates the proofs' branching structure, enabling reward-free search and overcoming the sparse reward problem inherent to formal verification. We evaluate QEDCartographer using the CoqGym benchmark of 68.5K theorems from 124 open-source Coq projects. QEDCartographer fully automatically proves 21.4% of the test-set theorems. Previous search-based proof-synthesis tools Tok, Tac, ASTactic, Passport, and Proverbot9001, which rely only on supervised learning, prove 9.6%, 9.8%, 10.9%, 12.5%, and 19.8%, respectively. Diva, which combines 62 tools, proves 19.2%. Comparing to the most effective prior tool, Proverbot9001, QEDCartographer produces 34% shorter proofs 29% faster, on average over the theorems both tools prove. Together, QEDCartographer and non-learning-based CoqHammer prove 30.3% of the theorems, while CoqHammer alone proves 26.6%. Our work demonstrates that reinforcement learning is a fruitful research direction for improving proof-synthesis tools' search mechanisms.

翻译：形式化验证是生成可靠软件的一种有前景的方法，但手动编写验证证明的困难严重限制了其在实践中的应用。最近的方法通过使用定理证明器引导在证明空间中的搜索，实现了部分证明合成的自动化。然而，定理证明器仅能提供最粗略的进展估计，导致搜索实质上处于无导向状态。为解决此问题，我们创建了QEDCartographer，这是一种结合监督学习与强化学习以更有效探索证明空间的自动化证明合成工具。QEDCartographer融入了证明的分支结构，实现了无奖励搜索，克服了形式化验证固有的稀疏奖励问题。我们使用包含来自124个开源Coq项目的68.5K个定理的CoqGym基准测试集对QEDCartographer进行评估。QEDCartographer能够全自动地证明测试集中21.4%的定理。先前仅依赖监督学习的基于搜索的证明合成工具Tok、Tac、ASTactic、Passport和Proverbot9001分别证明了9.6%、9.8%、10.9%、12.5%和19.8%。结合了62种工具的Diva证明了19.2%。与先前最有效的工具Proverbot9001相比，在两者均能证明的定理上，QEDCartographer平均生成的证明长度缩短34%，速度提升29%。QEDCartographer与非学习基础的CoqHammer共同证明了30.3%的定理，而CoqHammer单独证明了26.6%。我们的工作表明，强化学习是改进证明合成工具搜索机制的一个富有成效的研究方向。