Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics

When a language model processes a hallucinated response, its attention routing tends to fail in one of two shapes: over-concentrating on a narrow set of positions, or spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. We study these shapes as a diagnostic characterization, computed from attention matrices under \emph{forced scoring} of benchmark-labeled responses rather than during live generation. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport \emph{capacity}; we prove that every transpose-invariant spectral diagnostic of this operator is structurally \emph{orientation-blind} (it cannot distinguish an operator from its transpose, and therefore cannot detect information-flow direction), with a converse to the blindness theorem bounding any Lipschitz diagnostic's transpose sensitivity by the asymmetry coefficient $G$. Pairing this with a closed-form bipartite-Cheeger landscape for canonical causal architectures, we show that uniform causal attention satisfies an $n$-independent floor $φ\ge 1/5$, while window attention pierces the floor as $O(w/n)$; failure modes are shape-different, not just value-different. This floor is an idealized-architecture benchmark, not an empirical attractor: the fraction of real attention heads that pierce it is itself an architectural signature. The resulting two-axis diagnostic ($φ$ for capacity, $G$ for direction) yields a falsifiable polarity prediction: bottleneck- and diffuse-dominated benchmarks should exhibit opposite polarity. Under length-controlled evaluation, transport features retain interpretable signal (0.62-0.84 LC-AUROC) across the tested decoder-only, encoder-only, and encoder-decoder models, with polarity reversing as predicted between HaluEval and MedHallu.

翻译：当语言模型处理幻觉性回复时，其注意力路由往往呈现两种失效模式：过度集中于狭窄位置集，或扩散至相关性被稀释的程度，且失效形态携带诊断信号。我们将这些形态作为诊断特征进行研究，这些特征通过基准标注回复的强制评分（而非实时生成）从注意力矩阵中计算得出。一类广泛使用的谱方法分析度归一化注意力算子（控制传输容量）的对称分量；我们证明该算子的每个转置不变谱诊断在结构上均存在定向盲区（无法区分算子与其转置，因此无法检测信息流方向），并给出该盲区定理的逆命题：任意Lipschitz诊断的转置敏感性受不对称系数$G$约束。结合标准因果架构的闭式二分Cheeger景观，我们证明均匀因果注意力满足与$n$无关的下界$\phi\ge 1/5$，而窗口注意力以$O(w/n)$穿透该下界；失效模式具有形态差异而不仅是数值差异。该下界是理想化架构基准而非经验吸引子：实际注意力头穿透该下界的比例本身构成架构特征。由此产生的两轴诊断（$\phi$表征容量，$G$表征方向）可得出可证伪的极性预测：瓶颈主导与扩散主导的基准应呈现相反极性。在长度可控评估下，传输特征在测试的解码器专用、编码器专用及编码器-解码器模型中保持可解释信号（LC-AUROC为0.62-0.84），且极性在HaluEval与MedHallu之间按预测方向反转。