Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches involve training or fine-tuning an LLM on a specific dataset to perform well on particular domains, such as undergraduate-level mathematics. These methods struggle with generalizability to advanced mathematics. A fundamental limitation is that these approaches operate on static domains, failing to capture how mathematicians often work across multiple domains and projects simultaneously or cyclically. We present LeanAgent, a novel lifelong learning framework for theorem proving that continuously generalizes to and improves on ever-expanding mathematical knowledge without forgetting previously learned knowledge. LeanAgent introduces several key innovations, including a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity. LeanAgent successfully proves 162 theorems previously unproved by humans across 23 diverse Lean repositories, many from advanced mathematics. It performs significantly better than the static LLM baseline, proving challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics. In addition, we analyze LeanAgent's superior performance on key lifelong learning metrics. LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks. This emphasizes LeanAgent's continuous generalizability and improvement, explaining its superior theorem-proving performance.
翻译:大型语言模型(LLMs)在与Lean等交互式证明助手结合时,已在形式化定理证明等数学推理任务中取得成功。现有方法通过在特定数据集上训练或微调LLM,使其在特定领域(如本科水平数学)表现良好。这些方法难以推广至高等数学领域。一个根本性局限在于这些方法仅适用于静态领域,未能捕捉数学家通常跨多个领域并行或循环开展研究的工作模式。本文提出LeanAgent,一种用于定理证明的新型终身学习框架,能够持续泛化并改进不断扩展的数学知识,同时不遗忘已学知识。LeanAgent引入多项关键创新:包括基于数学难度优化学习路径的课程学习策略、用于高效管理演化数学知识的动态数据库,以及平衡稳定性与可塑性的渐进式训练方法。LeanAgent在23个不同的Lean代码库中成功证明了162条此前未被人类证明的定理,其中许多来自高等数学领域。其性能显著优于静态LLM基线,在抽象代数、代数拓扑等领域证明了具有挑战性的定理,并展现出从基础概念到高级主题的清晰学习进程。此外,我们分析了LeanAgent在关键终身学习指标上的卓越表现。该框架在稳定性和后向迁移方面取得优异评分——新任务的学习能够提升已学任务的性能。这凸显了LeanAgent持续泛化与改进的能力,从而解释了其卓越的定理证明性能。