We develop a theoretical analysis of LLM-guided formal theorem proving in interactive proof assistants (e.g., Lean) by modeling tactic proposal as a stochastic policy in a finite-horizon deterministic MDP. To capture modern representation learning, we treat the state and action spaces as general compact metric spaces and assume Lipschitz policies. To explain the gap between worst-case hardness and empirical success, we introduce problem distributions generated by a reference policy $q$, including a latent-variable model in which proofs exhibit reusable cut/lemma/sketch structure represented by a proof DAG. Under a top-$k$ search protocol and Tsybakov-type margin conditions, we derive lower bounds on finite-horizon success probability that decompose into search and learning terms, with learning controlled by sequential Rademacher/covering complexity. Our main separation result shows that when cut elimination expands a DAG of depth $D$ into a cut-free tree of size $Ω(Λ^D)$ while the cut-aware hierarchical process has size $O(λ^D)$ with $λ\llΛ$, a flat (cut-free) learner provably requires exponentially more data than a cut-aware hierarchical learner. This provides a principled justification for subgoal decomposition in recent agentic theorem provers.
翻译:我们通过将策略提议建模为有限时域确定性马尔可夫决策过程中的随机策略,对交互式证明助手(如Lean)中LLM引导的形式化定理证明进行了理论分析。为了捕捉现代表示学习的特点,我们将状态和动作空间视为一般的紧致度量空间,并假设策略满足Lipschitz条件。为了解释最坏情况下的困难性与实证成功之间的差距,我们引入了由参考策略$q$生成的问题分布,其中包括一个潜变量模型,其中证明呈现出由证明DAG表示的可重用的cut/引理/草图结构。在top-$k$搜索协议和Tsybakov型边界条件下,我们推导了有限时域成功概率的下界,该下界分解为搜索项和学习项,其中学习项由序列Rademacher/覆盖复杂度控制。我们的主要分离结果表明:当cut消除将一个深度为$D$的DAG扩展为规模为$Ω(Λ^D)$的无cut树,而具备cut感知的分层过程规模为$O(λ^D)$且满足$λ\llΛ$时,一个平坦的(无cut)学习器被证明需要比具备cut感知的分层学习器指数级更多的数据。这为近期智能体定理证明器中的子目标分解提供了原则性的理论依据。