Several explanation methods such as Integrated Gradients (IG) can be characterised as path-based methods, as they rely on a straight line between the data and an uninformative baseline. However, when applied to language models, these methods produce a path for each word of a sentence simultaneously, which could lead to creating sentences from interpolated words either having no clear meaning, or having a significantly different meaning compared to the original sentence. In order to keep the meaning of these sentences as close as possible to the original one, we propose Sequential Integrated Gradients (SIG), which computes the importance of each word in a sentence by keeping fixed every other words, only creating interpolations between the baseline and the word of interest. Moreover, inspired by the training procedure of several language models, we also propose to replace the baseline token "pad" with the trained token "mask". While being a simple improvement over the original IG method, we show on various models and datasets that SIG proves to be a very effective method for explaining language models.
翻译:若干解释方法(如积分梯度IG)可被归类为基于路径的方法,因其依赖于数据点与无信息基线之间的直线路径。然而,当应用于语言模型时,这些方法会同时为句子中的每个词生成路径,这可能导致由插值词构成的句子要么无明确含义,要么与原句语义存在显著差异。为使这些句子尽可能接近原句语义,我们提出顺序积分梯度(SIG),该方法在计算句子中每个词的重要性时固定其他所有词,仅对基线与目标词之间进行插值。此外,受多个语言模型训练过程的启发,我们建议将基线标记"pad"替换为训练标记"mask"。尽管这是对原始IG方法的简单改进,但我们在多种模型和数据集上证明,SIG是一种非常有效的语言模型解释方法。