A Language-Agent Approach to Formal Theorem-Proving

Language agents, which use a large language model (LLM) capable of in-context learning to interact with an external environment, have recently emerged as a promising approach to control tasks. We present the first language-agent approach to formal theorem-proving. Our method, COPRA, uses a high-capacity, black-box LLM (GPT-4) as part of a policy for a stateful backtracking search. During the search, the policy can select proof tactics and retrieve lemmas and definitions from an external database. Each selected tactic is executed in the underlying proof framework, and the execution feedback is used to build the prompt for the next policy invocation. The search also tracks selected information from its history and uses it to reduce hallucinations and unnecessary LLM queries. We evaluate our implementation of COPRA on the miniF2F benchmark for Lean and a set of Coq tasks from the Compcert project. On these benchmarks, COPRA significantly outperforms one-shot invocations of GPT-4, as well as state-of-the-art models fine-tuned on proof data, at finding correct proofs quickly. Our code and data are available at https://github.com/trishullab/copra.

翻译：语言智能体利用具备上下文学习能力的大语言模型（LLM）与外部环境交互，近来已成为控制任务领域一种颇具前景的方法。我们首次提出将语言智能体应用于形式定理证明。该方法名为COPRA，采用高容量黑盒LLM（GPT-4）作为带状态回溯搜索策略的组成部分。在搜索过程中，该策略可选取证明策略，并从外部数据库检索引理与定义。每个被选取的策略均在底层证明框架中执行，其执行反馈被用于构建下一次策略调用的提示。搜索过程中还会追踪历史中的选定信息，以减少模型幻觉和不必要的LLM查询。我们在面向Lean的miniF2F基准测试及Compcert项目中面向Coq的任务集上评估了COPRA的实现。实验表明，COPRA在快速找到正确证明方面显著优于单次调用的GPT-4以及基于证明数据微调的最先进模型。我们的代码与数据已开源至 https://github.com/trishullab/copra。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日