Instruction-tuned language models can answer the same causal-reasoning question differently after its English variable names are replaced by type-preserving placeholders, although the structural causal model and the gold answer are unchanged. We ask whether this lexical gap reflects information loss in the placeholder view or a misaligned read-out from a representation that still carries answer-relevant content. Vernier uses a paired-view weight update as an instrument and then inspects the mechanism left after the gap closes. In the working regimes, the evidence favours representational misalignment. A variable-name probe becomes more accurate on the placeholder view, and activation patching on Qwen-7B, Qwen-14B, and Llama-3.1-8B shows that the decision-token representation can transfer answer identity between views. The update that realigns the views is counterfactual augmentation over original and placeholder prompts, while the answer-subspace KL mainly sharpens intermediate answer-belief agreement. Success is bounded by model family, scale, and task. CRASS transfer is reliable across Qwen scales and Llama, e-CARE remains weak, and preliminary non-causal rename tasks show a similar qualitative pattern.
翻译:指令微调的语言模型在回答同一因果推理问题时,若将英文变量名替换为类型保留的占位符,尽管结构因果模型和标准答案均保持不变,模型却可能给出不同答案。我们探究这一词法间隙究竟反映的是占位符视图下的信息丢失,还是来自仍携带答案相关内容表征的失调读取。Vernier利用配对视图权重更新作为工具,进而检视间隙闭合后所遗留的机制。在有效工作区间内,证据支持表征失调的观点。变量名探针在占位符视图上的准确率提升,且对Qwen-7B、Qwen-14B和Llama-3.1-8B的激活修补实验表明,决策令牌表征能够在视图间传递答案身份。实现视图重新对齐的更新是对原始提示与占位符提示的反事实增强,而答案子空间KL散度主要强化中间答案信念的一致性。成功率受限于模型家族、规模及任务类型。CRASS迁移在Qwen各规模模型与Llama上表现可靠,e-CARE则持续较弱,而初步的非因果重命名任务呈现出相似的定性模式。