Large language models (LLMs) are increasingly used in workflows for generating formal proofs in Lean. These workflows often decompose problems into smaller lemmas, sample many proof attempts, and use compiler feedback to guide search. However, they can be prohibitively expensive, often spending substantial compute on attempts that ultimately fail. In this work, we address this problem with an action routing agent that consists of a data plane and a control plane. The data plane generates natural-language lemma decompositions, formalizes them in Lean, and samples proof attempts for the resulting theorem and lemma targets. The control plane observes previous failed Lean attempts, estimates both the likelihood of success and cost of another attempt, and decides whether to continue proving the current target or restart from a new breakdown. On a subset of PutnamBench, our agent decreases the cost by $28.9\%$ over a fixed-step baseline on average, preserving performance while using substantially less compute. These results suggest that failed Lean trajectories provide actionable signals for cost-aware resource allocation in agentic theorem proving.
翻译:大型语言模型(LLMs)在Lean形式化证明的生成流程中应用日益广泛。这些流程通常将问题分解为更小的引理,采样大量证明尝试,并利用编译器反馈来指导搜索。然而,此类方法可能极其昂贵,往往将大量算力消耗在最终失败的尝试上。在本工作中,我们通过一个由数据平面和控制平面组成的动作路由智能体来解决此问题。数据平面生成自然语言引理分解,在Lean中将其形式化,并对由此产生的定理和引理目标采样证明尝试。控制平面观察先前失败的Lean尝试,估算下一次尝试的成功概率与成本,并决定是继续证明当前目标还是从新的分解重新开始。在PutnamBench子集上,我们的智能体相较于固定步长基线平均降低了28.9%的成本,在保持性能的同时显著减少了算力消耗。这些结果表明,失败的Lean轨迹为智能定理证明中的成本感知资源分配提供了可操作的信号。