Formal mathematics libraries are rapidly expanding, creating a growing need to refactor verified proofs for maintainability and to improve training data quality for neural provers. However, scalable proof optimization is hindered by heterogeneous and heuristically specified objectives, scarce data, and high training and inference costs. To overcome these challenges, we introduce ImProver 2, a neurosymbolic framework for automated proof optimization in Lean 4. ImProver 2 combines a data-efficient expert-iteration pipeline with a scaffold that exposes formal structure alongside lightweight informal abstractions. We further introduce a suite of metrics capturing structural proof properties. Using ImProver 2, we train a 7B-parameter model that outperforms orders-of-magnitude larger models within the same model family, and is competitive with mid-tier frontier models across metrics. We additionally demonstrate that our neurosymbolic scaffold significantly improves performance across both small and frontier models. We show that with proper scaffolding and training, small models can effectively restructure research-level proofs over complex and varied metrics, matching substantially larger systems and establishing proof optimization as a scalable, learnable task.
翻译:形式化数学库正在快速扩展,由此产生了对已验证证明进行重构以提升可维护性、以及改善神经证明器训练数据质量的持续需求。然而,可扩展的证明优化受到异构且启发式设定的目标、稀缺数据以及高昂的训练与推理成本的制约。为应对这些挑战,我们提出了ImProver 2——一个面向Lean 4中自动证明优化的神经符号框架。ImProver 2将数据高效的专家迭代流程与一个能够暴露形式化结构并辅以轻量级非形式化抽象的支架相结合。我们进一步引入了一套度量结构证明属性的指标。利用ImProver 2,我们训练了一个70亿参数模型,其在同一模型家族中性能超越了数量级更大的模型,并在各项指标上与中端前沿模型具有竞争力。我们还证明了我们的神经符号支架能够显著提升小型及前沿模型的性能。研究表明,通过适当的支架与训练,小型模型能够基于复杂且多样化的指标有效地重构研究级证明,与规模大得多的系统相匹敌,并将证明优化确立为一项可扩展、可学习的任务。