Siamese encoders such as sentence transformers are among the least understood deep models. Established attribution methods cannot tackle this model class since it compares two inputs rather than processing a single one. To address this gap, we have recently proposed an attribution method specifically for Siamese encoders (M\"oller et al., 2023). However, it requires models to be adjusted and fine-tuned and therefore cannot be directly applied to off-the-shelf models. In this work, we reassess these restrictions and propose (i) a model with exact attribution ability that retains the original model's predictive performance and (ii) a way to compute approximate attributions for off-the-shelf models. We extensively compare approximate and exact attributions and use them to analyze the models' attendance to different linguistic aspects. We gain insights into which syntactic roles Siamese transformers attend to, confirm that they mostly ignore negation, explore how they judge semantically opposite adjectives, and find that they exhibit lexical bias.
翻译:孪生编码器(如句子Transformer)是深度学习模型中理解最薄弱的模型之一。现有归因方法无法处理此类模型,因其比较两个输入而非处理单一输入。为填补这一空白,我们近期提出了一种专用于孪生编码器的归因方法(Möller等,2023),但该方法需调整和微调模型,无法直接应用于现成模型。本研究重新审视这些限制,提出:(1)一种具备精确归因能力且保留原始模型预测性能的模型;(2)一种为现成模型计算近似归因的方法。我们全面比较了近似归因与精确归因,并利用它们分析模型对不同语言特征的关注程度,揭示了孪生Transformer关注的句法角色、确认其基本忽略否定词、探究其如何判断语义相反的形容词,并发现其存在词汇偏差。