Formal theorem proving, a field at the intersection of mathematics and computer science, has seen renewed interest with advancements in large language models (LLMs). This paper introduces SubgoalXL, a novel approach that synergizes subgoal-based proofs with expert learning to enhance LLMs' capabilities in formal theorem proving within the Isabelle environment. SubgoalXL addresses two critical challenges: the scarcity of specialized mathematics and theorem-proving data, and the need for improved multi-step reasoning abilities in LLMs. By optimizing data efficiency and employing subgoal-level supervision, SubgoalXL extracts richer information from limited human-generated proofs. The framework integrates subgoal-oriented proof strategies with an expert learning system, iteratively refining formal statement, proof, and subgoal generators. Leveraging the Isabelle environment's advantages in subgoal-based proofs, SubgoalXL achieves a new state-of-the-art performance of 56.1\% in Isabelle on the standard miniF2F dataset, marking an absolute improvement of 4.9\%. Notably, SubgoalXL successfully solves 41 AMC12, 9 AIME, and 3 IMO problems from miniF2F. These results underscore the effectiveness of maximizing limited data utility and employing targeted guidance for complex reasoning in formal theorem proving, contributing to the ongoing advancement of AI reasoning capabilities. The implementation is available at \url{https://github.com/zhaoxlpku/SubgoalXL}.
翻译:形式化定理证明作为数学与计算机科学的交叉领域,随着大语言模型(LLMs)的进展而重新受到关注。本文提出SubgoalXL,一种将基于子目标的证明与专家学习相结合的新方法,旨在增强LLMs在Isabelle环境中的形式化定理证明能力。SubgoalXL解决了两个关键挑战:专业数学与定理证明数据的稀缺性,以及LLMs在多步推理能力方面的提升需求。通过优化数据效率并采用子目标级监督,SubgoalXL从有限的人工生成证明中提取更丰富的信息。该框架将面向子目标的证明策略与专家学习系统相结合,迭代优化形式化陈述、证明及子目标生成器。借助Isabelle环境在基于子目标的证明中的优势,SubgoalXL在标准miniF2F数据集上于Isabelle中实现了56.1%的最新最优性能,绝对提升了4.9%。值得注意的是,SubgoalXL成功解决了miniF2F中的41道AMC12、9道AIME及3道IMO问题。这些结果证明了在形式化定理证明中最大化有限数据效用并采用针对性指导以处理复杂推理的有效性,为人工智能推理能力的持续进步作出了贡献。实现代码可在\url{https://github.com/zhaoxlpku/SubgoalXL}获取。