Agentic theorem provers often introduce intermediate lemmas, proof sketches, or subgoal decompositions before returning to tactic-level search. This can look like an expensive detour: if proving lemmas is itself hard, why should a learned prover spend effort there? We give a statistical learning answer. Instead of worst-case proof complexity over all formulas, we study the biased data distribution produced by a teacher prover: initial theorem states together with successful verified proof traces. We model proof search as a deterministic finite-horizon MDP and analyze offline imitation learning from those traces. The success bounds depend on the average length of teacher proofs, how predictable the teacher's next action is, and how accurately the student learns that local prediction problem. A flat student learns from fully inlined traces, so repeated subproofs appear many times in its training and test-time certificate. A hierarchical student instead predicts a reusable proof DAG and solves each shared block once. When flattening duplicates the same hard local argument exponentially many times, the sufficient-sample certificate produced by our bounds can be exponentially smaller for the hierarchical learner. This gives a concrete statistical mechanism by which reusable proof structure helps verifier-based theorem proving.
翻译:智能体定理证明器在返回策略级搜索之前,通常会引入中间引理、证明草图或子目标分解。这看似是昂贵的绕路:如果证明引理本身就很困难,为何学习型证明器要在此处投入精力?我们从统计学习角度给出答案。我们研究的是教师证明器产生的有偏数据分布(初始定理状态与经验证的成功证明轨迹),而非所有公式的最坏情况证明复杂度。我们将证明搜索建模为确定性的有限时域马尔可夫决策过程,并分析从这些轨迹中进行的离线模仿学习。成功界限取决于教师证明的平均长度、教师下一步动作的可预测性,以及学生从局部预测问题中学习的准确度。平面学生学习完全内联展开的轨迹,因此重复的子证明会在其训练和测试时的证书中多次出现。层次化学生则预测可复用的证明有向无环图,并对每个共享模块仅求解一次。当扁平化操作将相同的困难局部论证重复指数级次数时,我们通过界限生成的充分样本证书对层次化学习者而言可能呈指数级减小。这给出了一种具体的统计机制,解释了可复用的证明结构如何为基于验证器的定理证明提供帮助。