Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism

Chain of thought (CoT) fine-tuning aims to endow large language models (LLMs) with reasoning capabilities by training them on curated reasoning traces. It leverages both supervised and reinforced fine-tuning to cultivate human-like reasoning skills in LLMs, including detailed planning, divergent thinking, intuitive judgment, timely reflection, internal thinking, and fact perception, etc. As CoT fine-tuning has advanced, LLMs have demonstrated substantial improvements in tasks such as mathematical reasoning and code generation. However, existing surveys about CoT fine-tuning primarily focus on technical aspects and overlook a systematic analysis from the perspective of human reasoning mechanisms. Given that the ultimate goal of CoT fine-tuning is to enable LLMs to reason like humans, it is crucial to investigate this technique through the lens of human cognition. To fill this gap, we present the first comprehensive survey of CoT fine-tuning grounded in human reasoning theory. Specifically, inspired by the well-known Six Thinking Hats framework, which systematically characterizes common human thinking modes using six metaphorical hats, we classify and examine CoT fine-tuning methods through this lens. Furthermore, building upon this theory, we outline potential directions for future research in CoT fine-tuning. In addition, we compile a comprehensive overview of existing datasets and model performances, and a real-time GitHub repository \footnote{https://github.com/AI-Chen/Awesome-CoT-Finetuning} that continuously tracks recent advances in this area is maintained. We hope this survey will serve as a valuable resource to inspire innovation and foster progress in this rapidly evolving field.

翻译：思维链微调旨在通过大语言模型在精心设计的推理轨迹上进行训练，赋予其推理能力。该方法结合监督微调与强化微调，培养大语言模型的多维类人推理技能，包括详细规划、发散思维、直觉判断、及时反思、内部思考与事实感知等。随着思维链微调技术的进步，大语言模型在数学推理、代码生成等任务中展现出显著性能提升。然而，现有关于思维链微调的综述主要聚焦技术层面，缺乏从人类推理机制角度的系统分析。鉴于思维链微调的最终目标是使大语言模型像人类一样推理，从人类认知视角探究这一技术至关重要。为填补这一空白，我们首次提出基于人类推理理论的思维链微调全面综述。具体而言，受著名的六顶思考帽框架启发——该框架通过六种隐喻性帽子系统刻画人类常见思维模式——我们以此视角对思维链微调方法进行分类剖析。进一步地，基于该理论我们勾勒出未来思维链微调研究的潜在方向。此外，我们汇编了现有数据集与模型性能的全面概览，并维护一个持续追踪该领域最新进展的实时GitHub仓库\footnote{https://github.com/AI-Chen/Awesome-CoT-Finetuning}。希望本综述能作为宝贵资源，为这一快速演进领域激发创新并推动进步。