From Efficiency to Adaptivity: A Deeper Look at Adaptive Reasoning in Large Language Models

Recent advances in large language models (LLMs) have made reasoning a central benchmark for evaluating intelligence. While prior surveys focus on efficiency by examining how to shorten reasoning chains or reduce computation, this view overlooks a fundamental challenge: current LLMs apply uniform reasoning strategies regardless of task complexity, generating long traces for trivial problems while failing to extend reasoning for difficult tasks. This survey reframes reasoning through the lens of {adaptivity}: the capability to allocate reasoning effort based on input characteristics such as difficulty and uncertainty. We make three contributions. First, we formalize deductive, inductive, and abductive reasoning within the LLM context, connecting these classical cognitive paradigms with their algorithmic realizations. Second, we formalize adaptive reasoning as a control-augmented policy optimization problem balancing task performance with computational cost, distinguishing learned policies from inference-time control mechanisms. Third, we propose a systematic taxonomy organizing existing methods into training-based approaches that internalize adaptivity through reinforcement learning, supervised fine-tuning, and learned controllers, and training-free approaches that achieve adaptivity through prompt conditioning, feedback-driven halting, and modular composition. This framework clarifies how different mechanisms realize adaptive reasoning in practice and enables systematic comparison across diverse strategies. We conclude by identifying open challenges in self-evaluation, meta-reasoning, and human-aligned reasoning control.

翻译：近期大型语言模型（LLMs）的进展使推理成为评估智能的核心基准。尽管先前的综述聚焦于通过缩短推理链或减少计算来提升效率，但这一视角忽视了一个根本性挑战：当前LLMs无论任务复杂度如何均采用统一的推理策略，导致对简单问题生成冗长轨迹，而对困难任务则无法扩展推理深度。本综述通过{适应性}视角重构推理：即根据输入特性（如难度和不确定性）分配推理资源的能力。我们做出三项贡献。首先，我们在LLM框架内形式化演绎、归纳和溯因推理，将这些经典认知范式与其算法实现相联系。其次，我们将自适应推理形式化为控制增强的策略优化问题，平衡任务性能与计算成本，区分学习型策略与推理时控制机制。第三，我们提出系统分类法，将现有方法归纳为：通过强化学习、监督微调和学习型控制器内化适应性的训练型方法，以及通过提示条件化、反馈驱动终止和模块化组合实现适应性的免训练方法。该框架阐明了不同机制在实践中实现自适应推理的方式，并支持跨多样化策略的系统性比较。最后，我们指出了自我评估、元推理及人机对齐推理控制等领域的开放挑战。