Automated theorem proving systems built on Lean 4 increasingly rely on parallel tactic search over partially specified proofs, such as those generated by Draft-Sketch-Prove (DSP) pipelines. In current systems, each search branch reconstructs a proof state by re-running elaboration, leading to substantial per-branch overhead. In Lean 4 with Mathlib, this cost has two components: (1) import loading, which deserializes pre-compiled libraries (~60 s per branch); and (2) theorem-body elaboration, which re-checks the theorem context up to the target goal (estimated 18-735 s depending on proof complexity). Together, these account for >99% of per-branch wall time, making portfolio-based search impractical at scale. We observe that this overhead arises from a mismatch between the structure of proof search and its execution model: branching is implemented via repeated reconstruction of proof states rather than direct reuse. To address this, we introduce proof-state snapshotting, which captures the elaborated proof state once and reuses it across branches via a small extension to the Lean 4 language server. Across 48 miniF2F-v2 problems (45 prove-phase benchmarks and 3 full end-to-end runs), our approach achieves a 5.6-50x wall-time speedup over the standard fallback (average 14x, median 9.7x). Speedup increases with the number of proof branches. Our method is orthogonal to import-level caching (e.g., Kimina Lean Server), which avoids import loading but not theorem-body elaboration. The patched Lean binary and the Snapshot-DSP pipeline will be released as open source upon publication.
翻译:基于 Lean 4 的自动定理证明系统越来越依赖于对部分指定证明(例如由 Draft-Sketch-Prove (DSP) 流程生成的证明)进行并行策略搜索。在当前系统中,每个搜索分支均通过重新运行 elaboration 来重建证明状态,这导致每个分支产生大量开销。在搭载 Mathlib 的 Lean 4 中,该开销包含两部分:(1) 导入加载——反序列化预编译库(每分支约 60 秒);(2) 定理体展开(theorem-body elaboration)——重新检查至目标目标的定理上下文(根据证明复杂度估算为 18–735 秒)。两者合计占每分支挂钟时间的 99% 以上,使得基于组合的搜索在大规模情况下不可行。我们观察到,这种开销源于证明搜索的结构与执行模型之间的不匹配:分支是通过重复重建证明状态而非直接重用来实现的。为解决此问题,我们引入了证明状态快照技术,该技术一次性获取展开后的证明状态,并借助对 Lean 4 语言服务器的微小扩展在多个分支间重用该状态。在 48 个 miniF2F-v2 问题(45 个证明阶段基准测试和 3 次完整端到端运行)上,我们的方法相较于标准后备方案实现了 5.6–50 倍的挂钟时间加速(平均 14 倍,中位数 9.7 倍)。加速比随证明分支数量的增加而提升。我们的方法与导入级缓存(例如 Kimina Lean Server)正交,后者可避免导入加载但无法避免定理体展开。修补后的 Lean 二进制文件和 Snapshot-DSP 流程将在发表后作为开源代码发布。