Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks. However, even the most advanced open-source LLMs, such as the LLaMA family models, still face challenges when it comes to accurately solving complex multi-step mathematical problems. In this paper, we present an innovative process-oriented math verifier called \textbf{Math-Shepherd}, which assigns a reward score to each step of the LLM's outputs on math problems. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. With the guidance of Math-Shepherd, a series of open-source LLMs demonstrate exceptional performance. Among them, DeepSeek 67B \citep{DeepSeek-llm} stands out by achieving accuracy rates of 93.3\% on the GSM8K dataset and 48.1\% on the MATH dataset, without external enhancement such as tool usage. Our Math-Shepherd also outperforms the self-consistency method and other existing verification models. We believe that automatic process supervision holds significant potential for the future evolution of LLMs.
翻译:大型语言模型(LLMs)在多种任务中展现出卓越能力,然而即便是最先进的开源LLMs(如LLaMA系列模型),在精准求解复杂多步数学问题时仍面临挑战。本文提出一种创新的过程导向型数学验证器——**Math-Shepherd**,该验证器能为LLMs在数学问题中输出的每个步骤分配奖励分数。通过自动构建的过程级监督数据进行训练,Math-Shepherd突破了现有工作对人工标注的严重依赖瓶颈。在Math-Shepherd的引导下,一系列开源LLMs展现出卓越性能,其中DeepSeek 67B(引用)在无需工具增强等外部手段的情况下,于GSM8K数据集和MATH数据集上分别达到93.3%和48.1%的准确率。此外,Math-Shepherd的性能也优于自一致性方法及其他现有验证模型。我们相信,自动过程监督对LLMs的未来发展具有重要潜力。