Formal verification using proof assistants, such as Coq, enables the creation of high-quality software. However, the verification process requires significant expertise and manual effort to write proofs. Recent work has explored automating proof synthesis using machine learning and large language models (LLMs). This work has shown that identifying relevant premises, such as lemmas and definitions, can aid synthesis. We present Rango, a fully automated proof synthesis tool for Coq that automatically identifies relevant premises and also similar proofs from the current project and uses them during synthesis. Rango uses retrieval augmentation at every step of the proof to automatically determine which proofs and premises to include in the context of its fine-tuned LLM. In this way, Rango adapts to the project and to the evolving state of the proof. We create a new dataset, CoqStoq, of 2,226 open-source Coq projects and 196,929 theorems from GitHub, which includes both training data and a curated evaluation benchmark of well-maintained projects. On this benchmark, Rango synthesizes proofs for 32.0% of the theorems, which is 29% more theorems than the prior state-of-the-art tool Tactician. Our evaluation also shows that Rango adding relevant proofs to its context leads to a 47% increase in the number of theorems proven.
翻译:使用Coq等证明辅助工具进行形式化验证能够创建高质量的软件。然而,验证过程需要大量专业知识和人工努力来编写证明。近期研究探索了利用机器学习和大型语言模型(LLMs)实现证明合成的自动化。这些研究表明,识别相关前提(如引理和定义)有助于合成。我们提出了Rango——一个用于Coq的全自动证明合成工具,它能自动识别当前项目中的相关前提及相似证明,并在合成过程中加以利用。Rango在证明的每个步骤都采用检索增强技术,通过其微调的LLM自动确定将哪些证明和前提纳入上下文。通过这种方式,Rango能够适应具体项目及证明的动态演化状态。我们创建了包含2,226个开源Coq项目和196,929个定理的新数据集CoqStoq(源自GitHub),其中既包含训练数据,也包含来自维护良好项目的精选评估基准。在该基准测试中,Rango成功合成了32.0%定理的证明,相比先前最先进的工具Tactician,可证明定理数量提升了29%。我们的评估还表明,Rango在上下文中添加相关证明可使已验证定理数量增加47%。