One important approach to software verification is interactive theorem proving. However, writing formal proofs often requires substantial human effort, making proof automation highly important. Traditionally, proof automation has relied on symbolic provers. Recently, large language models (LLMs) have demonstrated strong capabilities in theorem proving, complementing symbolic provers. Nonetheless, prompting LLMs can be expensive and may pose security risks for confidential codebases. As a result, purely symbolic approaches remain important even in the LLM era, as they are cost-effective, secure, and complement the strengths of LLMs. Motivated by these considerations, we ask a new research question: can we extract the internal strategies of LLMs to enhance the capabilities of symbolic provers? As an initial attempt to answer this question, we propose Strat2Rocq, which extracts proof strategies from LLMs and formalizes them as lemmas in Rocq. These lemmas are accessible to symbolic provers such as CoqHammer. With the addition of these LLM-extracted lemmas, CoqHammer is able to prove more theorems. The knowledge extraction process involves analyzing the proof trajectories of LLMs on a training set of proved theorems. For each theorem, we prompt the LLM to generate a natural language proof, then ask it to summarize this proof into formalized lemmas with proofs. We also employ a standard agentic approach to mitigate errors during formalization. Our evaluation demonstrates that, on open-source Rocq projects for software verification, Strat2Rocq enhances the success rate of CoqHammer by 13.41%.
翻译:软件验证的一个重要方法是交互式定理证明。然而,编写形式化证明通常需要大量的人力,这使得证明自动化变得至关重要。传统上,证明自动化依赖于符号证明器。最近,大型语言模型在定理证明方面展现出强大的能力,对符号证明器形成了补充。尽管如此,提示大型语言模型可能成本高昂,并且可能对机密代码库构成安全风险。因此,即使在大型语言模型时代,纯符号方法仍然非常重要,因为它们成本效益高、安全,并且能补充大型语言模型的优势。基于这些考虑,我们提出了一个新的研究问题:我们能否提取大型语言模型的内部策略来增强符号证明器的能力?作为回答这个问题的初步尝试,我们提出了Strat2Rocq,它从大型语言模型中提取证明策略,并将其形式化为Rocq中的引理。这些引理可供CoqHammer等符号证明器使用。通过添加这些从大型语言模型提取的引理,CoqHammer能够证明更多的定理。知识提取过程涉及分析大型语言模型在已证明定理的训练集上的证明轨迹。对于每个定理,我们提示大型语言模型生成一个自然语言证明,然后要求其将此证明总结为带有证明的形式化引理。我们还采用标准的智能体方法来减轻形式化过程中的错误。我们的评估表明,在用于软件验证的开源Rocq项目中,Strat2Rocq将CoqHammer的成功率提高了13.41%。