Traditional language model-based theorem proving assumes that by training on a sufficient amount of formal proof data, a model will learn to prove theorems. Our key observation is that a wealth of informal information that is not present in formal proofs can be useful for learning to prove theorems. For instance, humans think through steps of a proof, but this thought process is not visible in the resulting code. We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof, thereby boosting the model's theorem-proving capabilities. Lean-STaR uses retrospective ground-truth tactics to generate synthetic thoughts for training the language model. At inference time, the trained model directly generates the thoughts prior to the prediction of the tactics in each proof step. Building on the self-taught reasoner framework, we then apply expert iteration to further fine-tune the model on the correct proofs it samples and verifies using the Lean solver. Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment, significantly outperforming base models ($\boldsymbol{43.4\% \rightarrow 46.3\%,}$ Pass@64). We also analyze the impact of the augmented thoughts on various aspects of the theorem proving process, providing insights into their effectiveness.
翻译:传统的基于语言模型的定理证明假设,通过在足够多的形式化证明数据上进行训练,模型将学会证明定理。我们的关键观察是,大量未出现在形式化证明中的非形式化信息对于学习证明定理是有用的。例如,人类会逐步思考证明过程,但这种思考过程在最终生成的代码中是不可见的。我们提出了Lean-STaR框架,用于训练语言模型在证明的每一步之前生成非形式化的思考内容,从而提升模型的定理证明能力。Lean-STaR利用回顾性的真实策略来生成用于训练语言模型的合成思考内容。在推理阶段,训练好的模型直接在预测每个证明步骤的策略之前生成思考内容。基于自教导推理器框架,我们进一步应用专家迭代法,对模型在Lean求解器验证下采样的正确证明上进行微调。Lean-STaR在Lean定理证明环境中的miniF2F-test基准测试上取得了最先进的结果,显著优于基线模型($\boldsymbol{43.4\% \rightarrow 46.3\%,}$ Pass@64)。我们还分析了增强思考内容对定理证明过程各方面的影响,为其有效性提供了深入见解。