Understanding relationships between attention heads is essential for interpreting the internal structure of Transformers, yet existing metrics do not capture this structure well. We focus on the subspaces spanned by attention-head weight matrices and quantify head-to-head relationships using the Projection Kernel (PK), a principal-angle-based measure of subspace similarity. Experiments show that PK reproduces known head-to-head interactions on the IOI task more clearly than prior metrics such as the Composition Score. We further introduce a framework to quantify the informativeness of PK distributions by comparing them with a reference distribution derived from random orthogonal subspaces. As an application, we analyze a directed graph constructed from PK and show that, in GPT2-small, L4H7 acts as a hub by functioning as an identity head.
翻译:理解注意力头之间的关系对于解释Transformer内部结构至关重要,然而现有指标未能很好地捕捉这种结构。我们聚焦于注意力头权重矩阵所张成的子空间,并利用投影核(PK)——一种基于主夹角的子空间相似性度量——来量化头与头之间的关系。实验表明,在IOI任务上,PK比先前指标(如组合分数)更清晰地复现了已知的头间交互模式。我们进一步引入一个框架,通过将PK分布与从随机正交子空间导出的参考分布进行比较,来量化PK分布的信息量。作为应用,我们分析了基于PK构建的有向图,并证明在GPT2-small模型中,L4H7通过充当恒等头而发挥着枢纽作用。