Reasoning Models Know What's Important, and Encode It in Their Activations

Language models often solve complex tasks by generating long reasoning chains, consisting of many steps with varying importance. While some steps are crucial for generating the final answer, others are removable. Determining which steps matter most, and why, remains an open question central to understanding how models process reasoning. We investigate if this question is best approached through model internals or through tokens of the reasoning chain itself. We find that model activations contain more information than tokens for identifying important reasoning steps. Crucially, by training probes on model activations to predict importance, we show that models encode an internal representation of step importance, even prior to the generation of subsequent steps. The internal representations of importance in different models yield high agreement on which steps are important. The representation is distributed across layers, and does not correlate with surface-level features, such as a step's relative position or its length. Our findings suggest that analyzing activations can reveal aspects of reasoning that surface-level approaches fundamentally miss, indicating that reasoning analyses should look into model internals.

翻译：语言模型常通过生成由多个重要性各异的步骤组成的冗长推理链来解决复杂任务。部分步骤对生成最终答案至关重要，而另一些则可被移除。如何确定哪些步骤最为关键及其原因，始终是理解模型推理过程的核心未解问题。我们探究此问题的最佳切入点究竟是模型内部机制还是推理链本身的词元。研究发现，在识别重要推理步骤方面，模型激活状态蕴含的信息量远超词元。关键的是，通过训练基于模型激活状态的探测模型来预测重要性，我们发现模型在生成后续步骤之前，就已对步骤重要性形成了内部表征。不同模型对关键步骤的重要性内部表征具有高度一致性。这种表征分布于各层网络之中，且与步骤的相对位置或长度等表层特征无关。我们的研究表明，分析激活状态能揭示表层方法根本遗漏的推理维度，这意味着推理分析应当深入模型内部。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

13+阅读 · 7月27日

从感知到推理：深度思考赋能多模态大语言模型

专知会员服务

26+阅读 · 2025年11月19日

感知、推理、思考与规划：大型多模态推理模型综述

专知会员服务

40+阅读 · 2025年5月10日

如何提升大模型通用推理能力？DeepSeek最新论文《CODEI/O：通过代码输入输出预测凝练推理模式》

专知会员服务

42+阅读 · 2025年2月16日