Transformer-based models excel in various natural language processing (NLP) tasks, attracting countless efforts to explain their inner workings. Prior methods explain Transformers by focusing on the raw gradient and attention as token attribution scores, where non-relevant information is often considered during explanation computation, resulting in confusing results. In this work, we propose highlighting the important information and eliminating irrelevant information by a refined information flow on top of the layer-wise relevance propagation (LRP) method. Specifically, we consider identifying syntactic and positional heads as important attention heads and focus on the relevance obtained from these important heads. Experimental results demonstrate that irrelevant information does distort output attribution scores and then should be masked during explanation computation. Compared to eight baselines on both classification and question-answering datasets, our method consistently outperforms with over 3\% to 33\% improvement on explanation metrics, providing superior explanation performance. Our anonymous code repository is available at: https://github.com/LinxinS97/Mask-LRP
翻译:基于Transformer的模型在各种自然语言处理(NLP)任务中表现出色,吸引了大量研究致力于解释其内部工作机制。以往的方法在解释Transformer时,通常将原始梯度和注意力作为令牌归因分数,但解释计算过程中常考虑无关信息,导致结果令人困惑。本研究提出在逐层相关性传播(LRP)方法的基础上,通过优化信息流来凸显重要信息并消除无关信息。具体而言,我们考虑将句法头与位置头识别为重要注意力头,并聚焦于从这些重要头中获取的相关性。实验结果表明,无关信息确实会扭曲输出归因分数,因此在解释计算过程中应将其屏蔽。在分类和问答数据集上与八种基线方法相比,我们的方法在解释指标上持续提升3%至33%,展现出更优的解释性能。我们的匿名代码仓库位于:https://github.com/LinxinS97/Mask-LRP