Seed-Prover 1.5：通过经验学习掌握本科水平定理证明 (Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience)

Jiangjie Chen,Wenxiang Chen,Jiacheng Du,Jinyi Hu,Zhicheng Jiang,Allan Jie,Xiaoran Jin,Xing Jin,Chenggang Li,Wenlei Shi,Zhihong Wang,Mingxuan Wang,Chenrui Wei,Shufa Wei,Huajian Xin,Fan Yang,Weihao Gao,Zheng Yuan,Tianyang Zhan,Zeyu Zheng,Tianxi Zhou,Thomas Hanwen Zhu

from arxiv, 21 pages

Large language models have recently made significant progress to generate rigorous mathematical proofs. In contrast, utilizing LLMs for theorem proving in formal languages (such as Lean) remains challenging and computationally expensive, particularly when addressing problems at the undergraduate level and beyond. In this work, we present \textbf{Seed-Prover 1.5}, a formal theorem-proving model trained via large-scale agentic reinforcement learning, alongside an efficient test-time scaling (TTS) workflow. Through extensive interactions with Lean and other tools, the model continuously accumulates experience during the RL process, substantially enhancing the capability and efficiency of formal theorem proving. Furthermore, leveraging recent advancements in natural language proving, our TTS workflow efficiently bridges the gap between natural and formal languages. Compared to state-of-the-art methods, Seed-Prover 1.5 achieves superior performance with a smaller compute budget. It solves \textbf{88\% of PutnamBench} (undergraduate-level), \textbf{80\% of Fate-H} (graduate-level), and \textbf{33\% of Fate-X} (PhD-level) problems. Notably, using our system, we solved \textbf{11 out of 12 problems} from Putnam 2025 within 9 hours. Our findings suggest that scaling learning from experience, driven by high-quality formal feedback, holds immense potential for the future of formal mathematical reasoning.

翻译：大型语言模型近期在生成严谨数学证明方面取得了显著进展。然而，利用LLM进行形式化语言（如Lean）中的定理证明仍然具有挑战性且计算成本高昂，尤其是在处理本科及以上水平的问题时。在本工作中，我们提出了\textbf{Seed-Prover 1.5}，这是一个通过大规模智能体强化学习训练的形式化定理证明模型，并配套一个高效的测试时扩展工作流。通过与Lean等工具的广泛交互，该模型在强化学习过程中持续积累经验，显著提升了形式化定理证明的能力与效率。此外，借助自然语言证明领域的最新进展，我们的测试时扩展工作流有效地弥合了自然语言与形式化语言之间的鸿沟。与最先进的方法相比，Seed-Prover 1.5以更小的计算预算实现了更优的性能。它解决了\textbf{PutnamBench}（本科水平）中\textbf{88\%的问题}、\textbf{Fate-H}（研究生水平）中\textbf{80\%的问题}以及\textbf{Fate-X}（博士水平）中\textbf{33\%的问题}。值得注意的是，使用我们的系统，我们在9小时内解决了\textbf{2025年普特南数学竞赛12道题中的11道}。我们的研究结果表明，在高质量形式化反馈的驱动下，扩展经验学习对于形式化数学推理的未来具有巨大潜力。