Despite the success of large language models (LLMs), the task of theorem proving still remains one of the hardest reasoning tasks that is far from being fully solved. Prior methods using language models have demonstrated promising results, but they still struggle to prove even middle school level theorems. One common limitation of these methods is that they assume a fixed theorem library during the whole theorem proving process. However, as we all know, creating new useful theorems or even new theories is not only helpful but crucial and necessary for advancing mathematics and proving harder and deeper results. In this work, we present LEGO-Prover, which employs a growing skill library containing verified lemmas as skills to augment the capability of LLMs used in theorem proving. By constructing the proof modularly, LEGO-Prover enables LLMs to utilize existing skills retrieved from the library and to create new skills during the proving process. These skills are further evolved (by prompting an LLM) to enrich the library on another scale. Modular and reusable skills are constantly added to the library to enable tackling increasingly intricate mathematical problems. Moreover, the learned library further bridges the gap between human proofs and formal proofs by making it easier to impute missing steps. LEGO-Prover advances the state-of-the-art pass rate on miniF2F-valid (48.0% to 57.0%) and miniF2F-test (45.5% to 47.1%). During the proving process, LEGO-Prover also manages to generate over 20,000 skills (theorems/lemmas) and adds them to the growing library. Our ablation study indicates that these newly added skills are indeed helpful for proving theorems, resulting in an improvement from a success rate of 47.1% to 50.4%. We also release our code and all the generated skills.
翻译:尽管大型语言模型(LLMs)取得了成功,但定理证明任务仍是最困难的推理任务之一,远未完全解决。以往基于语言模型的方法已展现出潜力,但即使面对中学级别的定理也仍难以证明。这些方法的共同局限在于假设整个证明过程使用固定的定理库。然而,创建新的有用定理甚至新理论,对推动数学发展、证明更困难更深层的结果至关重要且不可或缺。本研究提出LEGO-Prover,它采用包含已验证引理作为技能的成长类库,以增强用于定理证明的LLM能力。通过模块化构建证明,LEGO-Prover使LLM能在证明过程中利用库中检索到的现有技能,并创建新技能。这些技能进一步演化(通过提示LLM)以在更大尺度上丰富类库。模块化且可复用的技能持续被添加至类库,使系统能够应对日益复杂的数学问题。此外,学得的类库通过更易补全缺失步骤,进一步弥合了人工证明与形式化证明之间的鸿沟。LEGO-Prover在miniF2F-valid(48.0%提升至57.0%)和miniF2F-test(45.5%提升至47.1%)上刷新了最优通过率。在证明过程中,LEGO-Prover还生成了超过20,000个技能(定理/引理)并加入成长类库。消融实验表明,这些新增技能确实有助于定理证明,成功率达从47.1%提升至50.4%。我们同时开源了代码及所有生成的技能。