When a language model processes a hallucinated response, its attention routing tends to fail in one of two shapes: over-concentrating on a narrow set of positions, or spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. We study these shapes as a diagnostic characterization, computed from attention matrices under \emph{forced scoring} of benchmark-labeled responses rather than during live generation. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport \emph{capacity}; we prove that every transpose-invariant spectral diagnostic of this operator is structurally \emph{orientation-blind} (it cannot distinguish an operator from its transpose, and therefore cannot detect information-flow direction), with a converse to the blindness theorem bounding any Lipschitz diagnostic's transpose sensitivity by the asymmetry coefficient $G$. Pairing this with a closed-form bipartite-Cheeger landscape for canonical causal architectures, we show that uniform causal attention satisfies an $n$-independent floor $φ\ge 1/5$, while window attention pierces the floor as $O(w/n)$; failure modes are shape-different, not just value-different. This floor is an idealized-architecture benchmark, not an empirical attractor: the fraction of real attention heads that pierce it is itself an architectural signature. The resulting two-axis diagnostic ($φ$ for capacity, $G$ for direction) yields a falsifiable polarity prediction: bottleneck- and diffuse-dominated benchmarks should exhibit opposite polarity. Under length-controlled evaluation, transport features retain interpretable signal (0.62-0.84 LC-AUROC) across the tested decoder-only, encoder-only, and encoder-decoder models, with polarity reversing as predicted between HaluEval and MedHallu.
翻译:当语言模型处理幻觉性回复时,其注意力路由往往呈现两种失效模式:过度集中于狭窄位置集,或扩散至相关性被稀释的程度,且失效形态携带诊断信号。我们将这些形态作为诊断特征进行研究,这些特征通过基准标注回复的强制评分(而非实时生成)从注意力矩阵中计算得出。一类广泛使用的谱方法分析度归一化注意力算子(控制传输容量)的对称分量;我们证明该算子的每个转置不变谱诊断在结构上均存在定向盲区(无法区分算子与其转置,因此无法检测信息流方向),并给出该盲区定理的逆命题:任意Lipschitz诊断的转置敏感性受不对称系数$G$约束。结合标准因果架构的闭式二分Cheeger景观,我们证明均匀因果注意力满足与$n$无关的下界$\phi\ge 1/5$,而窗口注意力以$O(w/n)$穿透该下界;失效模式具有形态差异而不仅是数值差异。该下界是理想化架构基准而非经验吸引子:实际注意力头穿透该下界的比例本身构成架构特征。由此产生的两轴诊断($\phi$表征容量,$G$表征方向)可得出可证伪的极性预测:瓶颈主导与扩散主导的基准应呈现相反极性。在长度可控评估下,传输特征在测试的解码器专用、编码器专用及编码器-解码器模型中保持可解释信号(LC-AUROC为0.62-0.84),且极性在HaluEval与MedHallu之间按预测方向反转。