Recent studies have shown that Large Language Models (LLMs) augmented with chain-of-thought (CoT) reasoning demonstrate impressive problem-solving abilities. However, in this work, we identify a recurring issue where these models occasionally generate overly short reasoning, leading to degraded performance on even simple mathematical problems. Specifically, we investigate how reasoning length is embedded in the hidden representations of reasoning models and its impact on accuracy. Our analysis reveals that reasoning length is governed by a linear direction in the representation space, allowing us to induce overly short reasoning by steering the model along this direction. Building on this insight, we introduce ThinkEdit, a simple yet effective weight-editing approach to mitigate the issue of overly short reasoning. We first identify a small subset of attention heads (approximately 2%) that predominantly drive short reasoning behavior. We then edit the output projection weights of these heads to suppress the short reasoning direction. With changes to only 0.1% of the model's parameters, ThinkEdit effectively reduces overly short reasoning and yields notable accuracy gains for short reasoning outputs (+5.44%), along with an overall improvement across multiple math benchmarks (+2.43%). Our findings provide new mechanistic insights into how reasoning length is controlled within LLMs and highlight the potential of fine-grained model interventions to improve reasoning quality. Our code is available at https://github.com/Trustworthy-ML-Lab/ThinkEdit
翻译:近期研究表明,通过思维链推理增强的大型语言模型展现出令人印象深刻的问题解决能力。然而,本研究发现这些模型存在一个反复出现的问题:偶尔会生成过度简短的推理过程,导致即使在简单数学问题上的性能也会下降。具体而言,我们研究了推理长度如何嵌入推理模型的隐藏表示中,及其对准确性的影响。分析表明,推理长度由表示空间中的线性方向所控制,这使得我们能够通过沿该方向引导模型来诱发过度简短的推理。基于这一发现,我们提出了ThinkEdit——一种简单而有效的权重编辑方法,以缓解过度简短推理的问题。我们首先识别出主要驱动简短推理行为的注意力头子集(约占总数的2%),随后编辑这些头的输出投影权重以抑制简短推理方向。仅需修改模型0.1%的参数,ThinkEdit就能有效减少过度简短的推理,为简短推理输出带来显著的准确率提升(+5.44%),并在多个数学基准测试中实现整体性能改进(+2.43%)。我们的研究为理解推理长度在大型语言模型中的控制机制提供了新的机理见解,并凸显了通过细粒度模型干预提升推理质量的潜力。代码已开源:https://github.com/Trustworthy-ML-Lab/ThinkEdit