MerLean-Prover is an end-to-end Lean4 theorem prover that replaces sorry declarations with kernel-checkable proofs. It is built from three agent types (Planning, Check, and Lean) composed by a recursive outer loop whose unit of revision is the proof plan itself, and uses no fine-tuning, no custom RL objective, and no theorem-specific scaffolding. On FormalQualBench, a benchmark of 23 PhD-qualifying-exam theorems, MerLean-Prover solves 10/23, surpassing the strongest published open-source baseline (OpenGauss, 8/23). On Putnam2025, the same harness closes 12/12 with substantially lower total wall-clock than the next-best system that closes the full set. The harness also transfers to smaller models: Sonnet closes all four tested FormalQualBench problems, and Haiku closes the two short ones. These results suggest that harness design is a central factor in end-to-end Lean4 theorem proving, alongside raw model capability, and that a relatively simple harness can already be effective.
翻译:MerLean-Prover是一种端到端的Lean4定理证明器,它用内核可验证的证明替换了“sorry”声明。该系统由三种智能体类型(规划型、检查型和Lean型)构成,这些智能体通过递归外层循环组合而成,其修订单元为证明计划本身,且无需微调、无需自定义强化学习目标,也无需针对特定定理的辅助框架。在由23道博士资格考试定理组成的基准测试FormalQualBench上,MerLean-Prover解决了10/23的定理,超越了最强的已发布开源基线(OpenGauss,8/23)。在Putnam2025上,同一框架以显著低于次优完整求解系统的总运行时间完成了12/12的题目。该框架还可迁移至较小的模型:Sonnet能完成所有四项测试的FormalQualBench问题,而Haiku能完成两项短问题。这些结果表明,在端到端Lean4定理证明中,框架设计与原始模型能力同等重要,且相对简单的框架已能发挥有效作用。