Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches involve training or fine-tuning an LLM on a specific dataset to perform well on particular domains, such as undergraduate-level mathematics. These methods struggle with generalizability to advanced mathematics. A fundamental limitation is that these approaches operate on static domains, failing to capture how mathematicians often work across multiple domains and projects simultaneously or cyclically. We present LeanAgent, a novel lifelong learning framework for formal theorem proving that continuously generalizes to and improves on ever-expanding mathematical knowledge without forgetting previously learned knowledge. LeanAgent introduces several key innovations, including a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity. LeanAgent successfully generates formal proofs for 155 theorems across 23 diverse Lean repositories where formal proofs were previously missing, many from advanced mathematics. It performs significantly better than the static LLM baseline, proving challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics. In addition, we analyze LeanAgent's superior performance on key lifelong learning metrics. LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks. This emphasizes LeanAgent's continuous generalizability and improvement, explaining its superior theorem-proving performance.
翻译:大型语言模型(LLM)在与Lean等交互式证明助手结合时,已在形式化定理证明等数学推理任务中取得成功。现有方法通过在特定数据集上训练或微调LLM,使其在特定领域(如本科水平数学)表现良好。这些方法难以推广至高等数学领域。一个根本性局限在于,这些方法在静态领域上运行,未能捕捉数学家通常跨多个领域和项目同时或循环工作的方式。我们提出LeanAgent,一种用于形式化定理证明的新型终身学习框架,能够持续泛化并改进不断扩展的数学知识,同时不遗忘已学知识。LeanAgent引入了多项关键创新,包括:一种根据数学难度优化学习轨迹的课程学习策略、一个用于高效管理演化数学知识的动态数据库,以及平衡稳定性与可塑性的渐进式训练方法。LeanAgent成功为23个不同的Lean代码库中缺失形式化证明的155个定理生成了形式化证明,其中许多定理来自高等数学领域。其性能显著优于静态LLM基线,证明了抽象代数、代数拓扑等领域的挑战性定理,并展现出从基础概念到高级主题的清晰学习进程。此外,我们分析了LeanAgent在关键终身学习指标上的优越表现。LeanAgent在稳定性和后向迁移方面取得了优异分数,其中学习新任务能够提升对已学任务的性能。这强调了LeanAgent的持续泛化与改进能力,解释了其卓越的定理证明性能。