ThinkEdit：通过可解释权重编辑缓解推理模型中的过度简短思维问题 (ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models)

Recent studies have shown that Large Language Models (LLMs) augmented with chain-of-thought (CoT) reasoning demonstrate impressive problem-solving abilities. However, in this work, we identify a recurring issue where these models occasionally generate overly short reasoning, leading to degraded performance on even simple mathematical problems. Specifically, we investigate how reasoning length is embedded in the hidden representations of reasoning models and its impact on accuracy. Our analysis reveals that reasoning length is governed by a linear direction in the representation space, allowing us to induce overly short reasoning by steering the model along this direction. Building on this insight, we introduce ThinkEdit, a simple yet effective weight-editing approach to mitigate the issue of overly short reasoning. We first identify a small subset of attention heads (approximately 2%) that predominantly drive short reasoning behavior. We then edit the output projection weights of these heads to suppress the short reasoning direction. With changes to only 0.1% of the model's parameters, ThinkEdit effectively reduces overly short reasoning and yields notable accuracy gains for short reasoning outputs (+5.44%), along with an overall improvement across multiple math benchmarks (+2.43%). Our findings provide new mechanistic insights into how reasoning length is controlled within LLMs and highlight the potential of fine-grained model interventions to improve reasoning quality. Our code is available at https://github.com/Trustworthy-ML-Lab/ThinkEdit

翻译：近期研究表明，通过思维链推理增强的大型语言模型展现出令人印象深刻的问题解决能力。然而，本研究发现这些模型存在一个反复出现的问题：偶尔会生成过度简短的推理过程，导致即使在简单数学问题上的性能也会下降。具体而言，我们研究了推理长度如何嵌入推理模型的隐藏表示中，及其对准确性的影响。分析表明，推理长度由表示空间中的线性方向所控制，这使得我们能够通过沿该方向引导模型来诱发过度简短的推理。基于这一发现，我们提出了ThinkEdit——一种简单而有效的权重编辑方法，以缓解过度简短推理的问题。我们首先识别出主要驱动简短推理行为的注意力头子集（约占总数的2%），随后编辑这些头的输出投影权重以抑制简短推理方向。仅需修改模型0.1%的参数，ThinkEdit就能有效减少过度简短的推理，为简短推理输出带来显著的准确率提升（+5.44%），并在多个数学基准测试中实现整体性能改进（+2.43%）。我们的研究为理解推理长度在大型语言模型中的控制机制提供了新的机理见解，并凸显了通过细粒度模型干预提升推理质量的潜力。代码已开源：https://github.com/Trustworthy-ML-Lab/ThinkEdit

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日