Leave-One-Out (LOO) provides an intuitive measure of feature importance but is computationally prohibitive. While Layer-Wise Relevance Propagation (LRP) offers a potentially efficient alternative, its axiomatic soundness in modern Transformers remains largely under-examined. In this work, we first show that the bilinear propagation rules used in recent advances of AttnLRP violate the implementation invariance axiom. We prove this analytically and confirm it empirically in linear attention layers. Second, we also revisit CP-LRP as a diagnostic baseline and find that bypassing relevance propagation through the softmax layer -- backpropagating relevance only through the value matrices -- significantly improves alignment with LOO, particularly in middle-to-late Transformer layers. Overall, our results suggest that (i) bilinear factorization sensitivity and (ii) softmax propagation error potentially jointly undermine LRP's ability to approximate LOO in Transformers.
翻译:留一法(LOO)提供了一种直观的特征重要性度量方法,但计算成本过高。虽然层间相关性传播(LRP)提供了一种潜在的高效替代方案,但其在现代Transformer中的公理合理性在很大程度上仍未得到充分检验。在本研究中,我们首先证明了AttnLRP最新进展中使用的双线性传播规则违反了实现不变性公理。我们通过解析方法证明了这一点,并在线性注意力层中进行了实证验证。其次,我们重新审视了CP-LRP作为诊断基线,发现绕过softmax层的相关性传播——仅通过值矩阵反向传播相关性——能显著提高与LOO的一致性,尤其是在Transformer的中后期层。总体而言,我们的结果表明:(i)双线性分解敏感性和(ii)softmax传播误差可能共同削弱了LRP在Transformer中近似LOO的能力。