Chain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correlation between the effectiveness of CoT and the length of reasoning steps in prompts remains largely unknown. To shed light on this, we have conducted several empirical experiments to explore the relations. Specifically, we design experiments that expand and compress the rationale reasoning steps within CoT demonstrations, while keeping all other factors constant. We have the following key findings. First, the results indicate that lengthening the reasoning steps in prompts, even without adding new information into the prompt, considerably enhances LLMs' reasoning abilities across multiple datasets. Alternatively, shortening the reasoning steps, even while preserving the key information, significantly diminishes the reasoning abilities of models. This finding highlights the importance of the number of steps in CoT prompts and provides practical guidance to make better use of LLMs' potential in complex problem-solving scenarios. Second, we also investigated the relationship between the performance of CoT and the rationales used in demonstrations. Surprisingly, the result shows that even incorrect rationales can yield favorable outcomes if they maintain the requisite length of inference. Third, we observed that the advantages of increasing reasoning steps are task-dependent: simpler tasks require fewer steps, whereas complex tasks gain significantly from longer inference sequences.
翻译:思维链(CoT)对于提升大型语言模型(LLMs)的推理能力具有重要意义。然而,CoT的有效性与提示中推理步长之间的关联仍未知。为阐明这一问题,我们开展了多项实证实验探究其关系。具体而言,我们设计了在保持其他因素不变的情况下,扩展与压缩CoT示例中推理步骤的实验。主要发现如下:首先,结果表明,即使不增加提示中的新信息,延长推理步长也能显著提升LLMs在多个数据集上的推理能力。相反,即使保留关键信息,缩短推理步长也会显著削弱模型的推理能力。这一发现揭示了CoT提示中步骤数量的重要性,并为在复杂问题求解场景中更充分利用LLMs潜力提供了实践指导。其次,我们考察了CoT性能与示例所用推理依据之间的关系。令人惊讶的是,实验表明,只要保持所需的推理长度,即使不正确的推理依据也能产生有利结果。第三,我们观察到增加推理步长的优势受任务依赖性影响:简单任务所需步骤较少,而复杂任务则因更长的推理序列显著受益。