While the ecosystem of Lean and Mathlib has enjoyed celebrated success in formal mathematical reasoning with the help of large language models (LLMs), the absence of many folklore lemmas in Mathlib remains a persistent barrier that limits Lean's usability as an everyday tool for mathematicians like \LaTeX{} or Maple. To address this, we introduce MathlibLemma, a modular LLM-based pipeline for automated folklore-lemma mining: the discovery, formalization, and proving of reusable intermediate facts that mathematicians often take for granted but that are not always present in formal libraries. At its core, MathlibLemma proactively mines the missing connective tissue of mathematics. The pipeline produces a verified library of folklore-style lemmas, including 1,506 Lean-checked proofs that pass a proof-bypass screen; a small curated pilot subset has also been merged into Mathlib, providing external evidence that selected outputs can meet expert library standards. Leveraging this pipeline, we further construct the MathlibLemma benchmark, a suite of 4,028 non-trivial type-checked Lean statements spanning a broad range of mathematical domains. By transforming the role of LLMs from passive consumers to active contributors, this work takes a step toward AI-assisted expansion of formal mathematical libraries.
翻译:[translated abstract in Chinese]
尽管Lean与Mathlib的生态系统在大语言模型(LLMs)的辅助下,已在形式数学推理领域取得了显著成功,但Mathlib中缺乏许多民间引理(folklore lemmas)的问题,始终是阻碍其像\LaTeX{}或Maple那样成为数学家日常工具的一大障碍。为解决这一问题,我们提出MathlibLemma——一种基于LLM的模块化流水线,用于自动化民间引理挖掘:发现、形式化并证明数学家通常视为理所当然、但在形式化库中往往缺失的可复用中间结论。在核心层面,MathlibLemma主动挖掘数学中缺失的连接性组织。该流水线生成一个经验证的民间风格引理库,包含1,506个通过证明旁路筛查的Lean验证证明;其中一小部分精心筛选的试点数据已被合并至Mathlib,这一外部证据表明所选输出能够达到专家库标准。基于这一流水线,我们进一步构建了MathlibLemma基准测试集,包含4,028个覆盖广泛数学领域、经类型检查的非平凡Lean语句。通过将LLMs的角色从被动使用者转变为主动贡献者,本工作向实现形式数学库的AI辅助扩展迈出了一步。