Large language models~(LLMs) present an intriguing avenue of exploration in the domain of formal theorem proving. Nonetheless, the full utilization of these models, particularly in terms of demonstration formatting and organization, remains an underexplored area. In an endeavor to enhance the efficacy of LLMs, we introduce a subgoal-based demonstration learning framework, consisting of two primary elements: Firstly, drawing upon the insights of subgoal learning from the domains of reinforcement learning and robotics, we propose the construction of distinct subgoals for each demonstration example and refine these subgoals in accordance with the pertinent theories of subgoal learning. Secondly, we build upon recent advances in diffusion models to predict the optimal organization, simultaneously addressing two intricate issues that persist within the domain of demonstration organization: subset selection and order determination. Through the integration of subgoal-based learning methodologies, we have successfully increased the prevailing proof accuracy from 38.9\% to 44.3\% on the miniF2F benchmark. Furthermore, the adoption of diffusion models for demonstration organization can lead to an additional enhancement in accuracy to 45.5\%, or a $5\times$ improvement in sampling efficiency compared with the long-standing state-of-the-art method. Our code is available at \url{https://github.com/HKUNLP/subgoal-theorem-prover}.
翻译:大语言模型(LLMs)在形式化定理证明领域展现出引人探索的前景。然而,这类模型的充分利用——特别是在演示格式组织与编排方面——仍是一个尚未充分探索的领域。为提升LLMs的功效,我们提出一种基于子目标的演示学习框架,包含两个核心要素:首先,借鉴强化学习与机器人领域中的子目标学习洞见,我们为每个演示示例构建独特的子目标,并根据子目标学习的相关理论对其进行优化。其次,我们基于扩散模型的最新进展来预测最优编排,同时解决演示组织领域中持续存在的两个复杂问题:子集选择与顺序确定。通过集成基于子目标的学习方法,我们成功将miniF2F基准测试上的证明准确率从38.9%提升至44.3%。此外,采用扩散模型进行演示组织可进一步将准确率提升至45.5%,或相比长期保持的最优方法实现5倍的采样效率提升。我们的代码开源在\url{https://github.com/HKUNLP/subgoal-theorem-prover}。