Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

Recent advancements in reasoning with large language models (RLLMs), such as OpenAI-O1 and DeepSeek-R1, have demonstrated their impressive capabilities in complex domains like mathematics and coding. A central factor in their success lies in the application of long chain-of-thought (Long CoT) characteristics, which enhance reasoning abilities and enable the solution of intricate problems. However, despite these developments, a comprehensive survey on Long CoT is still lacking, limiting our understanding of its distinctions from traditional short chain-of-thought (Short CoT) and complicating ongoing debates on issues like "overthinking" and "test-time scaling." This survey seeks to fill this gap by offering a unified perspective on Long CoT. (1) We first distinguish Long CoT from Short CoT and introduce a novel taxonomy to categorize current reasoning paradigms. (2) Next, we explore the key characteristics of Long CoT: deep reasoning, extensive exploration, and feasible reflection, which enable models to handle more complex tasks and produce more efficient, coherent outcomes compared to the shallower Short CoT. (3) We then investigate key phenomena such as the emergence of Long CoT with these characteristics, including overthinking, and test-time scaling, offering insights into how these processes manifest in practice. (4) Finally, we identify significant research gaps and highlight promising future directions, including the integration of multi-modal reasoning, efficiency improvements, and enhanced knowledge frameworks. By providing a structured overview, this survey aims to inspire future research and further the development of logical reasoning in artificial intelligence.

翻译：近期，以OpenAI-O1和DeepSeek-R1为代表的推理大语言模型在数学、编程等复杂领域展现出卓越能力。其成功的关键在于长思维链特性的应用，该特性通过增强推理能力使模型能够解决复杂问题。然而，尽管相关研究不断推进，目前仍缺乏对长思维链的系统性综述，这限制了我们对其与传统短思维链差异的理解，也使关于"过度思考"和"测试时扩展"等议题的讨论难以深入。本综述旨在填补这一空白，为长思维链提供统一的理论框架：(1) 首先明确区分长思维链与短思维链，提出新型分类体系以归纳当前主流推理范式；(2) 进而剖析长思维链的三大核心特征——深度推理、广度探索与可行性反思，阐释其相较于浅层短思维链在处理复杂任务时实现更高效、连贯输出的内在机制；(3) 接着探究长思维链伴随的关键现象，包括过度思考与测试时扩展等特征涌现规律，揭示这些过程在实际场景中的具体表现；(4) 最后指出当前研究的重要空白领域，展望多模态推理融合、效率优化及知识框架增强等未来发展方向。通过构建系统化的知识体系，本综述期望激发后续研究，推动人工智能逻辑推理能力的持续演进。