Chain of Thought in Order: Discovering Learning-Friendly Orders for Arithmetic

The chain of thought, i.e., step-by-step reasoning, is one of the fundamental mechanisms of Transformers. While the design of intermediate reasoning steps has been extensively studied and shown to critically influence performance on mathematical, multi-step reasoning tasks, the ordering of these steps has received little attention, despite its significant effect on the difficulty of reasoning. This study addresses a novel task of unraveling the chain of thought -- reordering decoder input tokens into a learning-friendly sequence for Transformers, for learning arithmetic tasks. The proposed pipeline first trains a Transformer on a mixture of target sequences arranged in different orders and then identifies benign orders as those with fast loss drops in the early stage. As the search space grows factorially in sequence length, we propose a two-stage hierarchical approach for inter- and intra-block reordering. Experiments on seven order-sensitive arithmetic tasks show that our method identifies a learning-friendly order out of a few billion candidates. Notably, it recovered the reverse-digit order reported in prior studies for the multiplication task.

翻译：思维链，即逐步推理，是Transformer模型的基本机制之一。虽然中间推理步骤的设计已被广泛研究并证明对数学多步推理任务的性能具有关键影响，但这些步骤的排序却鲜受关注，尽管排序对推理难度具有显著影响。本研究提出了一项新颖任务：解构思维链——将解码器输入标记重新排序为适合Transformer学习算术任务的学习友好序列。所提出的流程首先在按不同顺序排列的目标序列混合数据上训练Transformer，然后通过早期训练阶段损失快速下降的特征来识别良性排序。由于搜索空间随序列长度呈阶乘增长，我们提出了一种两阶段分层方法，分别处理块间和块内重排序。在七个对顺序敏感的算术任务上的实验表明，我们的方法能够从数十亿候选排序中识别出学习友好的顺序。值得注意的是，该方法在乘法任务中复现了先前研究报道的反向数字顺序。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

超越语言的推理：潜在思维链推理的综合综述

专知会员服务

22+阅读 · 2025年5月23日

142页DeepSeek-R1 思维链技术：让我们一起<思考>大语言模型（LLM）的推理能力

专知会员服务

48+阅读 · 2025年4月12日

AI进入推理模型时代，一文带你读懂思维链

专知会员服务

40+阅读 · 2025年3月17日

如何提升大模型通用推理能力？DeepSeek最新论文《CODEI/O：通过代码输入输出预测凝练推理模式》

专知会员服务

42+阅读 · 2025年2月16日