Theorem proving is a fundamental task in mathematics. With the advent of large language models (LLMs) and interactive theorem provers (ITPs) like Lean, there has been growing interest in integrating LLMs and ITPs to automate theorem proving. In this approach, the LLM generates proof steps (tactics), and the ITP checks the applicability of the tactics at the current goal. The two systems work together to complete the proof. In this paper, we introduce DS-Prover, a novel dynamic sampling method for theorem proving. This method dynamically determines the number of tactics to apply to expand the current goal, taking into account the remaining time compared to the total allocated time for proving a theorem. This makes the proof search process more efficient by adjusting the balance between exploration and exploitation as time passes. We also augment the training dataset by decomposing simplification and rewrite tactics with multiple premises into tactics with single premises. This gives the model more examples to learn from and helps it to predict the tactics with premises more accurately. We perform our experiments using the Mathlib dataset of the Lean theorem prover and report the performance on two standard datasets, MiniF2F and ProofNet. Our methods achieve significant performance gains on both datasets. We achieved a state-of-the-art performance (Pass@1) of 14.2% on the ProofNet dataset and a performance of 29.8% on MiniF2F, slightly surpassing the best-reported Pass@1 of 29.6% using Lean.
翻译:定理证明是数学中的基本任务。随着大语言模型和Lean等交互式定理证明器的出现,整合大语言模型与交互式定理证明器以实现自动化定理证明的研究日益受到关注。在该方法中,大语言模型生成证明步骤(策略),交互式定理证明器验证策略在当前目标下的可应用性。两个系统协同工作以完成证明。本文提出DS-Prover,一种新颖的动态采样定理证明方法。该方法根据证明定理的剩余时间与总分配时间的比例,动态确定扩展当前目标所需应用的策略数量。通过随时间推移调整探索与利用之间的平衡,使证明搜索过程更加高效。我们还通过将包含多个前提的简化与重写策略分解为单一前提策略来增强训练数据集,为模型提供更多学习样本,帮助其更准确地预测带前提的策略。我们使用Lean定理证明器的Mathlib数据集进行实验,并在两个标准数据集MiniF2F和ProofNet上报告性能。该方法在两个数据集上均获得显著性能提升:在ProofNet数据集上达到14.2%的最优性能(Pass@1),在MiniF2F上达到29.8%,略高于使用Lean报告的最佳Pass@1结果29.6%。