While large models pre-trained on high-quality data exhibit excellent performance on mathematical reasoning (e.g., GSM8k, MultiArith), it remains challenging to specialize smaller models for these tasks. Common approaches to address this challenge include knowledge distillation from large teacher models and data augmentation (e.g., rephrasing questions and generating synthetic solutions). Despite these efforts, smaller models struggle with arithmetic computations, leading to errors in mathematical reasoning. In this work, we leverage a synthetic arithmetic dataset generated programmatically to enhance the reasoning capabilities of smaller models. We investigate two key approaches to incorporate this dataset: (1) intermediate fine-tuning, in which a model is fine-tuned on the arithmetic dataset before training it on a reasoning dataset, and (2) integrating the arithmetic dataset into an instruction-tuning mixture, allowing the model to learn arithmetic skills alongside general instruction-following abilities. Our experiments on multiple reasoning benchmarks demonstrate that incorporating an arithmetic dataset, whether through targeted fine-tuning or within an instruction-tuning mixture, enhances models' arithmetic capabilities, thereby improving their mathematical reasoning performance.
翻译:尽管基于高质量数据预训练的大型模型在数学推理(如GSM8k、MultiArith)中表现出色,但针对此类任务专门优化小型模型仍具挑战性。当前应对该挑战的常见方法包括从大型教师模型中进行知识蒸馏以及数据增强(例如问题改写与合成解决方案生成)。尽管采取了这些方法,小型模型在算术计算方面仍存在困难,导致数学推理中出现错误。本研究利用程序化生成的合成算术数据集来增强小型模型的推理能力。我们探索了两种关键方法以整合该数据集:(1) 中间微调——在推理数据集训练前先对算术数据集进行微调;(2) 将算术数据集融入指令微调混合集,使模型在学习通用指令遵循能力的同时掌握算术技能。我们在多个推理基准上的实验表明,无论通过针对性微调还是融入指令微调混合集,引入算术数据集均能增强模型的算术计算能力,从而提升其数学推理性能。