The chain of thought, i.e., step-by-step reasoning, is one of the fundamental mechanisms of Transformers. While the design of intermediate reasoning steps has been extensively studied and shown to critically influence performance on mathematical, multi-step reasoning tasks, the ordering of these steps has received little attention, despite its significant effect on the difficulty of reasoning. This study addresses a novel task of unraveling the chain of thought -- reordering decoder input tokens into a learning-friendly sequence for Transformers, for learning arithmetic tasks. The proposed pipeline first trains a Transformer on a mixture of target sequences arranged in different orders and then identifies benign orders as those with fast loss drops in the early stage. As the search space grows factorially in sequence length, we propose a two-stage hierarchical approach for inter- and intra-block reordering. Experiments on seven order-sensitive arithmetic tasks show that our method identifies a learning-friendly order out of a few billion candidates. Notably, it recovered the reverse-digit order reported in prior studies for the multiplication task.
翻译:思维链,即逐步推理,是Transformer模型的基本机制之一。虽然中间推理步骤的设计已被广泛研究并证明对数学多步推理任务的性能具有关键影响,但这些步骤的排序却鲜受关注,尽管排序对推理难度具有显著影响。本研究提出了一项新颖任务:解构思维链——将解码器输入标记重新排序为适合Transformer学习算术任务的学习友好序列。所提出的流程首先在按不同顺序排列的目标序列混合数据上训练Transformer,然后通过早期训练阶段损失快速下降的特征来识别良性排序。由于搜索空间随序列长度呈阶乘增长,我们提出了一种两阶段分层方法,分别处理块间和块内重排序。在七个对顺序敏感的算术任务上的实验表明,我们的方法能够从数十亿候选排序中识别出学习友好的顺序。值得注意的是,该方法在乘法任务中复现了先前研究报道的反向数字顺序。