Chain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correlation between the effectiveness of CoT and the length of reasoning steps in prompts remains largely unknown. To shed light on this, we have conducted several empirical experiments to explore the relations. Specifically, we design experiments that expand and compress the rationale reasoning steps within CoT demonstrations, while keeping all other factors constant. We have the following key findings. First, the results indicate that lengthening the reasoning steps in prompts, even without adding new information into the prompt, considerably enhances LLMs' reasoning abilities across multiple datasets. Alternatively, shortening the reasoning steps, even while preserving the key information, significantly diminishes the reasoning abilities of models. This finding highlights the importance of the number of steps in CoT prompts and provides practical guidance to make better use of LLMs' potential in complex problem-solving scenarios. Second, we also investigated the relationship between the performance of CoT and the rationales used in demonstrations. Surprisingly, the result shows that even incorrect rationales can yield favorable outcomes if they maintain the requisite length of inference. Third, we observed that the advantages of increasing reasoning steps are task-dependent: simpler tasks require fewer steps, whereas complex tasks gain significantly from longer inference sequences.
翻译:思维链(CoT)在提升大型语言模型(LLMs)推理能力方面具有重要意义。然而,CoT的有效性与提示中推理步骤长度之间的关联仍很大程度上未知。为阐明这一问题,我们开展了多项实证实验探索两者关系。具体而言,我们设计实验在保持其他因素不变的情况下,对CoT示例中的推理步骤进行扩展与压缩,得出以下关键发现。首先,结果表明:即使不在提示中添加新信息,延长推理步骤仍能显著提升LLMs在多个数据集上的推理能力;反之,即使保留关键信息,压缩推理步骤也会显著削弱模型的推理能力。这一发现凸显了CoT提示中步骤数量的重要性,并为在复杂问题解决场景中更好发挥LLMs潜力提供了实践指导。其次,我们探究了CoT性能与示例中推理依据之间的关系。令人惊讶的是,结果发现:只要推理依据保持必要的推理长度,即使包含错误推理依据也能产生良好效果。第三,我们观察到增加推理步骤的优势具有任务依赖性:简单任务需要更少步骤,而复杂任务则因更长的推理序列而显著受益。