We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with specialization in formal mathematical languages, the model undergoes supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-V1. Further refinement is achieved through reinforcement learning from proof assistant feedback (RLPAF). Beyond the single-pass whole-proof generation approach of DeepSeek-Prover-V1, we propose RMaxTS, a variant of Monte-Carlo tree search that employs an intrinsic-reward-driven exploration strategy to generate diverse proof paths. DeepSeek-Prover-V1.5 demonstrates significant improvements over DeepSeek-Prover-V1, achieving new state-of-the-art results on the test set of the high school level miniF2F benchmark ($63.5\%$) and the undergraduate level ProofNet benchmark ($25.3\%$).
翻译:我们介绍了DeepSeek-Prover-V1.5,一个专为Lean 4定理证明设计的开源语言模型。该模型通过优化训练和推理过程,对DeepSeek-Prover-V1进行了增强。该模型在DeepSeekMath-Base上进行预训练,并专门针对形式化数学语言进行了专业化。随后,它使用源自DeepSeek-Prover-V1的增强形式化定理证明数据集进行了监督微调。进一步的改进是通过基于证明助手反馈的强化学习实现的。除了DeepSeek-Prover-V1所采用的单次完整证明生成方法外,我们提出了RMaxTS,这是一种蒙特卡洛树搜索的变体,它采用内在奖励驱动的探索策略来生成多样化的证明路径。DeepSeek-Prover-V1.5相比DeepSeek-Prover-V1展现出显著提升,在高中水平的miniF2F基准测试集上取得了新的最先进结果($63.5\%$),并在本科水平的ProofNet基准测试集上达到了$25.3\%$的准确率。