Theorem proving is an important challenge for large language models (LLMs), as formal proofs can be checked rigorously by proof assistants such as Lean, leaving no room for hallucination. Existing LLM-based provers try to prove theorems in a fully autonomous mode without human intervention. In this mode, they struggle with novel and challenging theorems, for which human insights may be critical. In this paper, we explore LLMs as copilots that assist humans in proving theorems. We introduce Lean Copilot, a framework for running LLM inference in Lean. It enables programmers to build various LLM-based proof automation tools that integrate seamlessly into the workflow of Lean users. Using Lean Copilot, we build tools for suggesting proof steps (tactic suggestion), completing intermediate proof goals (proof search), and selecting relevant premises (premise selection) using LLMs. Users can use our pretrained models or bring their own ones that run either locally (with or without GPUs) or on the cloud. Experimental results demonstrate the effectiveness of our method in assisting humans and automating theorem proving process compared to existing rule-based proof automation in Lean. We open source all codes under a permissive MIT license to facilitate further research.
翻译:定理证明是大语言模型面临的重要挑战,因为形式化证明可通过Lean等证明辅助器严格验证,杜绝模型幻觉。现有基于大语言模型的定理证明器试图在全自主模式下进行证明,无需人工干预。但在该模式下,这些模型难以应对需要人类洞察力的新颖挑战性定理。本文探索将大语言模型作为辅助人类证明定理的“副驾驶”。我们提出Lean Copilot框架,实现在Lean中运行大语言模型推理的功能。该框架支持开发者构建多种基于大语言模型的证明自动化工具,并与Lean用户的工作流无缝集成。基于Lean Copilot,我们构建了使用大语言模型进行证明步骤建议(策略建议)、中间证明目标补全(证明搜索)及相关前提筛选(前提选择)的工具。用户既可使用我们预训练的模型,也可自带模型在本地(含/不含GPU)或云端运行。实验结果表明,与Lean中现有的基于规则的证明自动化方法相比,我们的方法在辅助人类与自动化定理证明流程方面更具效能。我们以宽松的MIT许可证开源全部代码以促进后续研究。