Large Language Models (LLMs) have demonstrated significant promise in formal theorem proving. In this study, we investigate the ability of LLMs to discover novel theorems and produce verified proofs. We propose a pipeline called \textit{Conjecturing-Proving Loop} (CPL), which iteratively generates mathematical conjectures and attempts to prove them in Lean 4. A key feature of CPL is that each iteration conditions the LLM on previously generated theorems and their formal proofs, enabling parameter-free improvement of proof strategies via in-context learning. We provide both theoretical and experimental evidence that CPL increases the discovery rate of hard-to-prove theorems compared to frameworks that generate statements and proofs simultaneously. Moreover, our experiments show that reusing the LLM's own formally verified outputs as context consistently improves subsequent proof success, demonstrating the effectiveness of self-generated in-context learning for neural theorem proving. The source code is available at https://github.com/auto-res/ConjecturingProvingLoop.
翻译:大语言模型(LLMs)在形式化定理证明领域展现出显著潜力。本研究探讨了LLMs发现新定理并生成已验证证明的能力。我们提出了一种名为\textit{猜想-证明循环}(Conjecturing-Proving Loop, CPL)的流水线,该流水线迭代生成数学猜想,并尝试在Lean 4中证明它们。CPL的一个关键特性在于,每次迭代都会基于先前生成的定理及其形式化证明对LLM进行条件化处理,从而通过上下文学习实现无参数调整的证明策略改进。我们提供了理论与实验证据,表明相较于同时生成命题与证明的框架,CPL能提升难证定理的发现率。此外,我们的实验显示,将LLM自身生成的形式化验证输出作为上下文进行重用,能持续提升后续证明成功率,这证明了自生成上下文学习在神经定理证明中的有效性。源代码可见于https://github.com/auto-res/ConjecturingProvingLoop。