Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, failing to explore sufficient reasoning paths despite inherent capabilities. These issues lead to inefficiencies and potential inaccuracies, limiting practical deployment in resource-constrained settings. Existing methods to mitigate overthinking, such as suppressing reflective keywords or adjusting reasoning length, may inadvertently induce underthinking, compromising accuracy. Therefore, we propose ReBalance, a training-free framework that achieves efficient reasoning with balanced thinking. ReBalance leverages confidence as a continuous indicator of reasoning dynamics, identifying overthinking through high confidence variance and underthinking via consistent overconfidence. By aggregating hidden states from a small-scale dataset into reasoning mode prototypes, we compute a steering vector to guide LRMs' reasoning trajectories. A dynamic control function modulates this vector's strength and direction based on real-time confidence, pruning redundancy during overthinking, and promoting exploration during underthinking. Extensive experiments conducted on four models ranging from 0.5B to 32B, and across nine benchmarks in math reasoning, general question answering, and coding tasks demonstrate that ReBalance effectively reduces output redundancy while improving accuracy, offering a general, training-free, and plug-and-play strategy for efficient and robust LRM deployment. Project page and code are available at https://rebalance-ai.github.io .
翻译:大型推理模型(LRMs)展现了卓越的推理能力,然而它们常受"过度思考"(在简单问题上耗费冗余计算步骤)或"欠思考"(尽管具备内在能力却未能探索足够推理路径)的困扰。这些问题导致效率低下与潜在不准确性,限制了其在资源受限场景下的实际部署。为缓解过度思考而设计的现有方法(如抑制反思性关键词或调整推理长度)可能无意中引发欠思考,从而牺牲准确性。为此,我们提出ReBalance——一种无需训练的框架,通过平衡思考实现高效推理。ReBalance利用置信度作为推理动态的连续指标,通过高置信度方差识别过度思考,通过持续过度自信识别欠思考。通过从小规模数据集中将隐状态聚合为推理模式原型,我们计算出一个引导向量来调节LRMs的推理轨迹。一个动态控制函数基于实时置信度调节该向量的强度与方向:在过度思考时剪枝冗余,在欠思考时促进探索。在参数规模从0.5B至32B的四个模型上,针对数学推理、通用问答与编码任务等九个基准进行的广泛实验表明,ReBalance能有效降低输出冗余并提升准确率,为高效且鲁棒的LRM部署提供了一种通用、无训练且即插即用的策略。项目页面与代码详见https://rebalance-ai.github.io。