Transformer-based models excel in various natural language processing (NLP) tasks, attracting countless efforts to explain their inner workings. Prior methods explain Transformers by focusing on the raw gradient and attention as token attribution scores, where non-relevant information is often considered during explanation computation, resulting in confusing results. In this work, we propose highlighting the important information and eliminating irrelevant information by a refined information flow on top of the layer-wise relevance propagation (LRP) method. Specifically, we consider identifying syntactic and positional heads as important attention heads and focus on the relevance obtained from these important heads. Experimental results demonstrate that irrelevant information does distort output attribution scores and then should be masked during explanation computation. Compared to eight baselines on both classification and question-answering datasets, our method consistently outperforms with over 3\% to 33\% improvement on explanation metrics, providing superior explanation performance. Our anonymous code repository is available at: https://github.com/LinxinS97/Mask-LRP
翻译:基于Transformer的模型在各种自然语言处理(NLP)任务中表现出色,吸引了大量研究致力于解释其内部工作机制。现有方法主要通过将原始梯度与注意力作为令牌归因分数来解释Transformer,但解释计算过程中常引入非相关信息,导致结果混淆。本文提出一种基于层级相关性传播(LRP)的方法,通过改进信息流来凸显重要信息并消除不相关信息。具体而言,我们将句法头和位置头识别为重要注意力头,并聚焦于这些重要头获取的相关性。实验结果表明,非相关信息确实会扭曲输出归因分数,因此在解释计算中应当屏蔽此类信息。在分类与问答数据集上与八种基线方法的对比显示,我们的方法在解释指标上持续领先,提升幅度达3%至33%,展现出更优的解释性能。匿名代码仓库地址:https://github.com/LinxinS97/Mask-LRP