Multi-step reasoning instruction, such as chain-of-thought prompting, is widely adopted to explore better language models (LMs) performance. We report on the systematic strategy that LMs employ in such a multi-step reasoning process. Our controlled experiments reveal that LMs rely more heavily on heuristics, such as lexical overlap, in the earlier stages of reasoning, where more reasoning steps remain to reach a goal. Conversely, their reliance on heuristics decreases as LMs progress closer to the final answer through multiple reasoning steps. This suggests that LMs can backtrack only a limited number of future steps and dynamically combine heuristic strategies with rationale ones in tasks involving multi-step reasoning.
翻译:多步推理指令(如思维链提示)被广泛用于探索语言模型的更优性能。我们报告了语言模型在此类多步推理过程中采用的系统性策略。我们的对照实验表明,在推理的早期阶段(即距离达成目标尚需较多推理步骤时),语言模型更依赖启发式策略(如词汇重叠);相反,随着语言模型通过多步推理逐步接近最终答案,其对启发式策略的依赖程度会降低。这表明在涉及多步推理的任务中,语言模型仅能回溯有限的后继步骤,并动态地将启发式策略与理性策略相结合。