Despite the success of Siamese encoder models such as sentence transformers (ST), little is known about the aspects of inputs they pay attention to. A barrier is that their predictions cannot be attributed to individual features, as they compare two inputs rather than processing a single one. This paper derives a local attribution method for Siamese encoders by generalizing the principle of integrated gradients to models with multiple inputs. The solution takes the form of feature-pair attributions, and can be reduced to a token-token matrix for STs. Our method involves the introduction of integrated Jacobians and inherits the advantageous formal properties of integrated gradients: it accounts for the model's full computation graph and is guaranteed to converge to the actual prediction. A pilot study shows that in an ST few token-pairs can often explain large fractions of predictions, and it focuses on nouns and verbs. For accurate predictions, it however needs to attend to the majority of tokens and parts of speech.
翻译:尽管句子Transformer等Siamese编码器模型取得了成功,但关于它们关注输入哪些方面的问题仍鲜为人知。一个障碍在于:由于Siamese编码器需要比较两个输入而非处理单个输入,其预测结果无法归因于单个特征。本文通过将积分梯度原理泛化至多输入模型,推导出适用于Siamese编码器的局部归因方法。该方案以特征对归因的形式呈现,对于句子Transformer可简化为令牌-令牌矩阵。我们的方法引入了综合雅可比矩阵,并继承了积分梯度在形式上的优越性质:它完整覆盖模型的计算图,且保证收敛到实际预测值。初步研究表明,在句子Transformer中,少量令牌对常能解释大部分预测结果,且模型主要关注名词和动词。然而,为获得准确预测,模型仍需注意大部分令牌及词性。