Large Language Models (LLMs) have exhibited remarkable reasoning capabilities and become the foundation of language technologies. Inspired by the great success of code data in training LLMs, we naturally wonder at which training stage introducing code data can really help LLMs reasoning. To this end, this paper systematically explores the impact of code data on LLMs at different stages. Concretely, we introduce the code data at the pre-training stage, instruction-tuning stage, and both of them, respectively. Then, the reasoning capability of LLMs is comprehensively and fairly evaluated via six reasoning tasks in five domains. We critically analyze the experimental results and provide conclusions with insights. First, pre-training LLMs with the mixture of code and text can significantly enhance LLMs' general reasoning capability almost without negative transfer on other tasks. Besides, at the instruction-tuning stage, code data endows LLMs the task-specific reasoning capability. Moreover, the dynamic mixing strategy of code and text data assists LLMs to learn reasoning capability step-by-step during training. These insights deepen the understanding of LLMs regarding reasoning ability for their application, such as scientific question answering, legal support, etc. The source code and model parameters are released at the link:~\url{https://github.com/yingweima2022/CodeLLM}.
翻译:大语言模型(LLMs)已展现出卓越的推理能力,并成为语言技术的基础。受代码数据在训练LLMs中取得巨大成功的启发,我们自然想知道:在哪个训练阶段引入代码数据才能真正有助于LLMs的推理能力?为此,本文系统性地探索了代码数据在不同阶段对LLMs的影响。具体而言,我们分别在预训练阶段、指令微调阶段以及两个阶段同时引入代码数据。随后,通过五个领域的六项推理任务,全面且公平地评估了LLMs的推理能力。我们批判性地分析了实验结果,并得出带有洞见的结论。首先,在预训练阶段将代码与文本数据混合,能显著增强LLMs的通用推理能力,且几乎不会对其他任务产生负迁移。其次,在指令微调阶段,代码数据赋予LLMs任务特定的推理能力。此外,代码与文本数据的动态混合策略有助于LLMs在训练过程中逐步学习推理能力。这些洞见加深了对LLMs推理能力的理解,为其在科学问答、法律支持等领域的应用提供了指导。源代码和模型参数已在链接中发布:\url{https://github.com/yingweima2022/CodeLLM}。