When leveraging language models for reasoning tasks, generating explicit chain-of-thought (CoT) steps often proves essential for achieving high accuracy in final outputs. In this paper, we investigate if models can be taught to internalize these CoT steps. To this end, we propose a simple yet effective method for internalizing CoT steps: starting with a model trained for explicit CoT reasoning, we gradually remove the intermediate steps and finetune the model. This process allows the model to internalize the intermediate reasoning steps, thus simplifying the reasoning process while maintaining high performance. Our approach enables a GPT-2 Small model to solve 9-by-9 multiplication with up to 99% accuracy, whereas standard training cannot solve beyond 4-by-4 multiplication. Furthermore, our method proves effective on larger language models, such as Mistral 7B, achieving over 50% accuracy on GSM8K without producing any intermediate steps.
翻译:在利用语言模型处理推理任务时,生成显式的思维链步骤通常对获得高精度的最终输出至关重要。本文研究模型是否能够通过学习内化这些思维链步骤。为此,我们提出了一种简单而有效的方法来内化思维链步骤:从一个经过显式思维链推理训练的模型出发,我们逐步移除中间步骤并对模型进行微调。这一过程使模型能够内化中间推理步骤,从而在保持高性能的同时简化推理过程。我们的方法使GPT-2 Small模型能够以高达99%的准确率解决9×9乘法运算,而标准训练无法解决超过4×4的乘法问题。此外,我们的方法在更大的语言模型(如Mistral 7B)上也证明有效,能在不生成任何中间步骤的情况下,在GSM8K数据集上实现超过50%的准确率。