Modern transformer attention is internally multi-agent -- heads compete and coordinate -- yet we train it as if it were a monolithic optimizer. We formalize this gap: cross-entropy training induces an implicit potential game among heads, and gradient descent converges to Nash equilibria with potentially unbounded inefficiency due to unpriced externalities (redundancy, correlated errors). Our main result bounds the Price of Anarchy by $Γ(G)$, the off-diagonal mass of a head interaction matrix capturing weight and gradient coupling. Under mild smoothness assumptions, we prove that both \emph{excess hallucination probability} and \emph{excess head redundancy} scale with PoA, unifying two distinct failure modes into a single mechanism. The bound is prescriptive: regularization that reduces $Γ(G)$ provably tightens PoA. We instantiate this as GAME-LoRA, combining Barlow Twins decorrelation with log-determinant coordination pressure. Experiments validate the theory: $Γ(G)$ predicts hallucination ($p{<}0.05$), emergent coalitions exhibit selective coordination, and GAME-LoRA achieves up to 18\% hallucination reduction (8\% average) with no knowledge degradation -- a Pareto improvement inaccessible to methods ignoring the game structure.
翻译:现代Transformer注意力机制在内部是多智能体系统——各个注意力头相互竞争与协调——然而我们却将其作为单一优化器进行训练。本文形式化描述了这一差距:交叉熵训练在注意力头之间诱导了一种隐含的势博弈,而梯度下降会收敛至纳什均衡,由于未定价的外部性(冗余性、关联性错误),可能产生无界效率损失。我们的核心结果将无政府状态代价(PoA)上界约束为$Γ(G)$——这是一个刻画权重与梯度耦合关系的注意力头交互矩阵的非对角元素质量。在温和的平滑性假设下,我们证明了**超额幻觉概率**与**超额注意力头冗余度**均随PoA线性增长,从而将两种不同的失效模式统一于单一机制。该上界具有指导意义:降低$Γ(G)$的正则化方法可严格收紧PoA。我们将此实例化为GAME-LoRA方法,该方法融合了Barlow Twins去相关技术与对数行列式协调压力机制。实验验证了理论:$Γ(G)$可预测幻觉现象($p{<}0.05$),涌现的注意力头联盟表现出选择性协调特征,而GAME-LoRA在保持知识表征能力不变的前提下,实现了最高18%(平均8%)的幻觉降低——这种帕累托改进是忽视博弈结构的方法所无法实现的。