We provide evidence of performative chain-of-thought (CoT) in reasoning models, where a model becomes strongly confident in its final answer, but continues generating tokens without revealing its internal belief. Our analysis compares activation probing, early forced answering, and a CoT monitor across two large models (DeepSeek-R1 671B & GPT-OSS 120B) and find task difficulty-specific differences: The model's final answer is decodable from activations far earlier in CoT than a monitor is able to say, especially for easy recall-based MMLU questions. We contrast this with genuine reasoning in difficult multihop GPQA-Diamond questions. Despite this, inflection points (e.g., backtracking, 'aha' moments) occur almost exclusively in responses where probes show large belief shifts, suggesting these behaviors track genuine uncertainty rather than learned "reasoning theater." Finally, probe-guided early exit reduces tokens by up to 80% on MMLU and 30% on GPQA-Diamond with similar accuracy, positioning attention probing as an efficient tool for detecting performative reasoning and enabling adaptive computation.
翻译:我们提供了推理模型中表演性思维链(CoT)的证据,即模型对其最终答案变得高度自信,但仍继续生成词元而不揭示其内部信念。我们的分析比较了激活探测、早期强制回答和思维链监控器在两种大型模型(DeepSeek-R1 671B 和 GPT-OSS 120B)上的表现,并发现了任务难度特异性差异:模型最终答案可从思维链中远早于监控器作出判断的激活状态解码得出,尤其对于基于简单记忆的MMLU问题。我们将此与困难多跳GPQA-Diamond问题中的真实推理进行对比。尽管如此,转折点(如回溯、"顿悟"时刻)几乎只出现在探测显示信念大幅转变的响应中,表明这些行为追踪的是真实不确定性而非习得的"推理剧场"。最后,在保持相近准确率的前提下,探测引导的早期退出机制在MMLU上减少高达80%的词元,在GPQA-Diamond上减少30%,这确立了注意力探测作为检测表演性推理和实现自适应计算的高效工具地位。