AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

In this paper, we introduce AceMath, a suite of frontier math models that excel in solving complex math problems, along with highly effective reward models capable of evaluating generated solutions and reliably identifying the correct ones. To develop the instruction-tuned math models, we propose a supervised fine-tuning (SFT) process that first achieves competitive performance across general domains, followed by targeted fine-tuning for the math domain using a carefully curated set of prompts and synthetically generated responses. The resulting model, AceMath-72B-Instruct greatly outperforms Qwen2.5-Math-72B-Instruct, GPT-4o and Claude-3.5 Sonnet. To develop math-specialized reward model, we first construct AceMath-RewardBench, a comprehensive and robust benchmark for evaluating math reward models across diverse problems and difficulty levels. After that, we present a systematic approach to build our math reward models. The resulting model, AceMath-72B-RM, consistently outperforms state-of-the-art reward models. Furthermore, when combining AceMath-72B-Instruct with AceMath-72B-RM, we achieve the highest average rm@8 score across the math reasoning benchmarks. We will release model weights, training data, and evaluation benchmarks at: https://research.nvidia.com/labs/adlr/acemath

翻译：本文介绍了AceMath，一套在解决复杂数学问题上表现卓越的前沿数学模型，以及能够评估生成解并可靠识别正确答案的高效奖励模型。为开发指令调优的数学模型，我们提出了监督微调流程：首先在通用领域实现具有竞争力的性能，随后使用精心构建的提示集与合成生成的响应，针对数学领域进行定向微调。所得模型AceMath-72B-Instruct显著优于Qwen2.5-Math-72B-Instruct、GPT-4o与Claude-3.5 Sonnet。为开发数学专用奖励模型，我们首先构建了AceMath-RewardBench——一个用于评估不同难度层级数学问题奖励模型的全面且鲁棒的基准。随后，我们提出了构建数学奖励模型的系统化方法。所得模型AceMath-72B-RM持续超越现有最优奖励模型。此外，将AceMath-72B-Instruct与AceMath-72B-RM结合使用时，我们在数学推理基准测试中取得了最高的平均rm@8分数。我们将通过以下地址发布模型权重、训练数据与评估基准：https://research.nvidia.com/labs/adlr/acemath

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/