Large Reasoning Language Models (LRLMs or LRMs) demonstrate remarkable capabilities in complex reasoning tasks, but suffer from significant computational inefficiencies due to overthinking phenomena. Existing efficient reasoning methods face the challenge of balancing reasoning quality with inference cost reduction. We propose \textbf{Adaptive Reasoning Suppression (ARS)}, a novel training-free approach that dynamically suppresses redundant reasoning steps while preserving accuracy through adaptive certainty monitoring. ARS introduces a multi-checkpoint certainty estimation mechanism with progressive suppression thresholds, achieving superior efficiency compared to static suppression methods. Our extensive evaluation across mathematical reasoning benchmarks using multiple model architectures demonstrates that ARS achieves up to 53%, 46.1%, and 57.9% in token, latency and energy reduction, while maintaining or improving accuracy.
翻译:大型推理语言模型(LRLMs 或 LRMs)在复杂推理任务中展现出卓越能力,但由于过度思考现象而存在显著的计算效率低下问题。现有高效推理方法面临平衡推理质量与降低推理成本的挑战。我们提出 **自适应推理抑制(ARS)**,一种无需训练的新方法,通过自适应确定性监控动态抑制冗余推理步骤,同时保持准确性。ARS 引入了具有渐进抑制阈值的多检查点确定性估计机制,相比静态抑制方法实现了更优的效率。我们在多个模型架构上对数学推理基准进行的广泛评估表明,ARS 在保持或提升准确性的同时,实现了高达 53%、46.1% 和 57.9% 的令牌、延迟和能耗降低。