Lean-STaR: Learning to Interleave Thinking and Proving

Traditional language model-based theorem proving assumes that by training on a sufficient amount of formal proof data, a model will learn to prove theorems. Our key observation is that a wealth of informal information that is not present in formal proofs can be useful for learning to prove theorems. For instance, humans think through steps of a proof, but this thought process is not visible in the resulting code. We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof, thereby boosting the model's theorem-proving capabilities. Lean-STaR uses retrospective ground-truth tactics to generate synthetic thoughts for training the language model. At inference time, the trained model directly generates the thoughts prior to the prediction of the tactics in each proof step. Building on the self-taught reasoner framework, we then apply expert iteration to further fine-tune the model on the correct proofs it samples and verifies using the Lean solver. Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment, significantly outperforming base models ($\boldsymbol{43.4\% \rightarrow 46.3\%,}$ Pass@64). We also analyze the impact of the augmented thoughts on various aspects of the theorem proving process, providing insights into their effectiveness.

翻译：传统的基于语言模型的定理证明假设，通过在足够多的形式化证明数据上进行训练，模型将学会证明定理。我们的关键观察是，大量未出现在形式化证明中的非形式化信息对于学习证明定理可能是有用的。例如，人类会逐步思考证明过程，但这种思考过程在最终生成的代码中是不可见的。我们提出了Lean-STaR框架，用于训练语言模型在证明的每一步之前生成非形式化的思考，从而提升模型的定理证明能力。Lean-STaR利用回顾性的真实策略来生成合成思考数据，以训练语言模型。在推理阶段，训练后的模型直接在预测每个证明步骤的策略之前生成思考。基于自教导推理器框架，我们进一步应用专家迭代方法，对模型在Lean求解器验证下采样的正确证明进行微调。Lean-STaR在Lean定理证明环境中的miniF2F-test基准测试上取得了最先进的结果，显著优于基础模型（$\boldsymbol{43.4\% \rightarrow 46.3\%,}$ Pass@64）。我们还分析了增强思考对定理证明过程各方面的影响，深入探讨了其有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/