Large Reasoning Models (LRMs) often suffer from overthinking, generating unnecessarily long reasoning chains even for simple tasks. This leads to substantial computational overhead with limited performance gain, primarily due to redundant verification and repetitive generation. While prior work typically constrains output length or optimizes correctness, such coarse supervision fails to guide models toward concise yet accurate inference. In this paper, we propose ENTRA, an entropy-based training framework that suppresses redundant reasoning while preserving performance. ENTRA first estimates the token-level importance using a lightweight Bidirectional Importance Estimation (BIE) method, which accounts for both prediction confidence and forward influence. It then computes a redundancy reward based on the entropy of low-importance tokens, normalized by its theoretical upper bound, and optimizes this reward via reinforcement learning. Experiments on mathematical reasoning benchmarks demonstrate that ENTRA reduces output length by 37% to 53% with no loss-and in some cases, gains-in accuracy. Our approach offers a principled and efficient solution to reduce overthinking in LRMs, and provides a generalizable path toward redundancy-aware reasoning optimization.
翻译:大型推理模型(LRMs)常存在过度思考问题,即使在处理简单任务时也会生成不必要的冗长推理链。这主要源于冗余验证和重复生成,导致计算开销显著增加而性能提升有限。现有研究通常通过约束输出长度或优化正确性来应对,但此类粗粒度监督方法无法引导模型实现简洁而准确的推理。本文提出ENTRA,一种基于熵的训练框架,旨在抑制冗余推理的同时保持模型性能。ENTRA首先采用轻量级双向重要性估计(BIE)方法评估词元级重要性,该方法综合考虑预测置信度与前向影响力;随后基于低重要性词元的熵计算冗余奖励(经理论上限归一化处理),并通过强化学习优化该奖励。在数学推理基准测试上的实验表明,ENTRA能在保持准确率无损(部分任务甚至有所提升)的前提下,将输出长度缩减37%至53%。本方法为减少LRMs的过度思考提供了原理清晰、高效可行的解决方案,并为实现冗余感知的推理优化开辟了可推广的技术路径。