Reasoning large language models (LLMs) have demonstrated superior capacities in solving complicated problems by generating long chain-of-thoughts (CoT), but such a lengthy CoT incurs high inference costs. In this study, we introduce ES-CoT, an inference-time method that shortens CoT generation by detecting answer convergence and stopping early with minimal performance loss. At the end of each reasoning step, we prompt the LLM to output its current final answer, denoted as a step answer. We then track the run length of consecutive identical step answers as a measure of answer convergence. Once the run length exhibits a sharp increase and exceeds a minimum threshold, the generation is terminated. We provide both empirical and theoretical support for this heuristic: step answers steadily converge to the final answer, and large run-length jumps reliably mark this convergence. Experiments on five reasoning datasets across three LLMs show that ES-CoT reduces the number of inference tokens by about 41\% on average while maintaining accuracy comparable to standard CoT. Further, ES-CoT integrates seamlessly with self-consistency prompting and remains robust across hyperparameter choices, highlighting it as a practical and effective approach for efficient reasoning.
翻译:推理大语言模型通过生成长思维链在解决复杂问题上展现出卓越能力,但冗长的思维链会导致高昂的推理成本。本研究提出ES-CoT——一种在推理时通过检测答案收敛性实现早期停止、以最小性能损失缩短思维链生成的方法。在每个推理步骤结束时,我们提示大语言模型输出其当前最终答案(称为步骤答案),随后追踪连续相同步骤答案的运行长度作为答案收敛性的度量。当运行长度出现急剧增长并超过最小阈值时,即终止生成过程。我们为此启发式方法提供了实证与理论支持:步骤答案会稳定收敛至最终答案,而运行长度的显著跃迁可可靠标记该收敛状态。在三个大语言模型上对五个推理数据集的实验表明,ES-CoT平均减少约41%的推理令牌数,同时保持与标准思维链相当的准确率。此外,ES-CoT能无缝集成自洽性提示技术,并在不同超参数选择下保持鲁棒性,凸显其作为高效推理方法的实用性与有效性。