Recent advancements in large language models (LLMs) have demonstrated remarkable abilities in handling a variety of natural language processing (NLP) downstream tasks, even on mathematical tasks requiring multi-step reasoning. In this report, we introduce the KwaiYiiMath which enhances the mathematical reasoning abilities of KwaiYiiBase1, by applying Supervised Fine-Tuning (SFT) and Reinforced Learning from Human Feedback (RLHF), including on both English and Chinese mathematical tasks. Meanwhile, we also constructed a small-scale Chinese primary school mathematics test set (named KMath), consisting of 188 examples to evaluate the correctness of the problem-solving process generated by the models. Empirical studies demonstrate that KwaiYiiMath can achieve state-of-the-art (SOTA) performance on GSM8k, CMath, and KMath compared with the similar size models, respectively.
翻译:近期大型语言模型(LLMs)的进展展现了其在处理多种自然语言处理(NLP)下游任务中的卓越能力,甚至在需要多步推理的数学任务上表现突出。本报告介绍了KwaiYiiMath——该模型通过应用监督微调(SFT)和基于人类反馈的强化学习(RLHF),在英语和中文数学任务上增强了KwaiYiiBase1的数学推理能力。同时,我们构建了一个小规模的中小学数学测试集(命名为KMath),包含188个样本,用于评估模型生成解题过程的正确性。实证研究表明,与同规模模型相比,KwaiYiiMath在GSM8k、CMath和KMath数据集上分别取得了最先进的(SOTA)性能。