The long chain-of-thought (LongCoT) capability is central to the recent breakthroughs achieved by large language models in complex reasoning tasks. However, the accompanying issue of ''underthinking'', where models exhibit shallow reasoning by frequently switching thoughts without sufficient exploration, limits both performance and token efficiency. To address this problem, we propose a simple yet effective reasoning strategy: the SmartSwitch inference framework. This framework can be easily integrated into any large language model as a plug-and-play solution, continuously monitoring the model's reasoning process to detect underthinking and guide it toward deeper exploration of promising but overlooked thoughts. Specifically, the perception module identifies points where thoughts switch and evaluates the potential of the preceding thought using an off-the-shelf process reward model (PRM). If a high-potential thought is found to be prematurely abandoned, the intervention module interrupts the ongoing inference, backtracks to the point before the switch, and inserts a "deepening prompt" to encourage further exploration along that promising path. Extensive experiments on challenging mathematical reasoning benchmarks demonstrate that our method significantly enhances the performance of various large language models of different sizes.
翻译:长链思维(LongCoT)能力是大型语言模型在复杂推理任务中取得突破的核心。然而,随之而来的“欠思考”问题——即模型频繁切换思维路径而未进行充分探索所表现出的浅层推理——限制了其性能与计算效率。为解决此问题,我们提出一种简洁而有效的推理策略:SmartSwitch推理框架。该框架可作为即插即用方案轻松集成至任何大型语言模型,持续监控模型的推理过程以检测欠思考现象,并引导其对有潜力但被忽视的思维路径进行深度探索。具体而言,感知模块通过现成的过程奖励模型(PRM)识别思维切换节点并评估前序思维的潜力值;当发现高潜力思维被过早放弃时,干预模块将中断当前推理,回溯至切换前的节点,并插入“深化提示”以鼓励沿该潜力路径进行深入探索。在具有挑战性的数学推理基准测试上的大量实验表明,本方法能显著提升不同规模各类大语言模型的性能。