Transformer-based models excel in various natural language processing (NLP) tasks, attracting countless efforts to explain their inner workings. Prior methods explain Transformers by focusing on the raw gradient and attention as token attribution scores, where non-relevant information is often considered during explanation computation, resulting in confusing results. In this work, we propose highlighting the important information and eliminating irrelevant information by a refined information flow on top of the layer-wise relevance propagation (LRP) method. Specifically, we consider identifying syntactic and positional heads as important attention heads and focus on the relevance obtained from these important heads. Experimental results demonstrate that irrelevant information does distort output attribution scores and then should be masked during explanation computation. Compared to eight baselines on both classification and question-answering datasets, our method consistently outperforms with over 3\% to 33\% improvement on explanation metrics, providing superior explanation performance. Our anonymous code repository is available at: https://github.com/LinxinS97/Mask-LRP
翻译:Transformer模型在各种自然语言处理任务中表现卓越,吸引了大量研究来阐释其内部工作机制。现有方法通过将原始梯度和注意力作为token归因分数来解释Transformer,但解释计算过程中常纳入无关信息,导致结果令人困惑。本文提出在逐层相关性传播方法基础上,通过精细化信息流来凸显重要信息并消除无关信息。具体而言,我们将句法注意力头和位置注意力头识别为重要注意力头,并聚焦于这些重要头获取的相关性。实验结果表明,无关信息确实会扭曲输出归因分数,因此在解释计算过程中应予以掩蔽。在分类和问答数据集上与八种基准方法的对比显示,我们的方法在解释指标上始终提升超过3%至33%,展现出卓越的解释性能。我们的匿名代码仓库位于:https://github.com/LinxinS97/Mask-LRP