By incorporating test-time compute scaling, large reasoning models (LRMs) can solve complex problems through explicit chain-of-thought (CoT) reasoning processes. However, they often suffer from overthinking, resulting in redundant token outputs and degraded accuracy. Current methods to mitigate this issue remain limited: training-based approaches require substantial computational resources, while training-free methods rely on well-crafted prompts or unreliable confidence signals. In this work, we investigate early stopping from the perspective of attention distributions and propose a simple method, ASAG, which infers the model's reasoning state and adaptively adjusts the generation strategy. The proposed framework is training-free and plug-and-play, enabling seamless integration into existing LRMs. Extensive experiments on nine benchmarks demonstrate consistent improvements across mainstream LRMs with varying parameter scales, including the DeepSeek-R1-Distill and Qwen3 series. Specifically, ASAG improves average accuracy by 3.2% while reducing the number of generated tokens by nearly 40% across all reasoning tasks on Qwen3-8B.
翻译:通过引入测试时计算缩放,大型推理模型能够借助显式思维链推理过程解决复杂问题。然而,这类模型常因"过度思考"而产生冗余词元输出并导致准确率下降。当前缓解该问题的方法仍存在局限:基于训练的方法需要大量计算资源,而免训练方法则依赖精心设计的提示词或不可靠的置信度信号。本研究从注意力分布角度切入,提出了一种简单方法ASAG,该方法能够推断模型的推理状态并自适应调整生成策略。所提出的框架无需训练且即插即用,可无缝集成至现有大型推理模型中。在九个基准测试上的大量实验表明,该方法在包含DeepSeek-R1-Distill和Qwen3系列等不同参数规模的主流推理模型上均能实现一致性改进。具体而言,在Qwen3-8B模型的所有推理任务上,ASAG在将生成词元数量减少近40%的同时,平均准确率提升了3.2%。