Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

In this paper, we investigate the underlying factors that potentially enhance the mathematical reasoning capabilities of large language models (LLMs). We argue that the data scaling law for math reasoning capabilities in modern LLMs is far from being saturated, highlighting how the model's quality improves with increases in data quantity. To support this claim, we introduce the Skywork-Math model series, supervised fine-tuned (SFT) on common 7B LLMs using our proposed 2.5M-instance Skywork-MathQA dataset. Skywork-Math 7B has achieved impressive accuracies of 51.2% on the competition-level MATH benchmark and 83.9% on the GSM8K benchmark using only SFT data, outperforming an early version of GPT-4 on MATH. The superior performance of Skywork-Math models contributes to our novel two-stage data synthesis and model SFT pipelines, which include three different augmentation methods and a diverse seed problem set, ensuring both the quantity and quality of Skywork-MathQA dataset across varying difficulty levels. Most importantly, we provide several practical takeaways to enhance math reasoning abilities in LLMs for both research and industry applications.

翻译：本文旨在探究可能提升大语言模型（LLMs）数学推理能力的潜在因素。我们认为，现代大语言模型中数学推理能力的数据缩放定律远未达到饱和，这突显了模型质量如何随着数据量的增加而提升。为支持这一论点，我们引入了Skywork-Math模型系列，该系列基于我们提出的包含250万个样本的Skywork-MathQA数据集，对常见的70亿参数大语言模型进行了监督微调（SFT）。Skywork-Math 7B模型在仅使用SFT数据的情况下，于竞赛级MATH基准测试中取得了51.2%的准确率，在GSM8K基准测试中取得了83.9%的准确率，其MATH表现超越了早期版本的GPT-4。Skywork-Math模型的卓越性能得益于我们新颖的两阶段数据合成与模型SFT流程，该流程包含三种不同的数据增强方法及一个多样化的种子问题集，确保了Skywork-MathQA数据集在不同难度级别上兼具数量与质量。最重要的是，我们为研究和工业应用提供了若干提升大语言模型数学推理能力的实用建议。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/