In many real natural language processing application scenarios, practitioners not only aim to maximize predictive performance but also seek faithful explanations for the model predictions. Rationales and importance distribution given by feature attribution methods (FAs) provide insights into how different parts of the input contribute to a prediction. Previous studies have explored how different factors affect faithfulness, mainly in the context of monolingual English models. On the other hand, the differences in FA faithfulness between multilingual and monolingual models have yet to be explored. Our extensive experiments, covering five languages and five popular FAs, show that FA faithfulness varies between multilingual and monolingual models. We find that the larger the multilingual model, the less faithful the FAs are compared to its counterpart monolingual models.Our further analysis shows that the faithfulness disparity is potentially driven by the differences between model tokenizers. Our code is available: https://github.com/casszhao/multilingual-faith.
翻译:在众多实际自然语言处理应用场景中,实践者不仅追求最大化预测性能,还期望获得对模型预测结果的忠实解释。特征归因方法提供的依据与重要性分布,揭示了输入各部分对预测的贡献方式。先前研究主要围绕单语言英语模型,探讨了不同因素对忠实度的影响。然而,多语言模型与单语言模型在特征归因忠实度上的差异尚未得到充分探究。我们开展了覆盖五种语言与五种主流特征归因方法的广泛实验,结果表明特征归因忠实度在多语言与单语言模型间存在差异。我们研究发现:多语言模型规模越大,其特征归因忠实度相较于对应的单语言模型越低。进一步分析显示,这种忠实度差异可能源于模型分词器的差异。我们的代码已开源:https://github.com/casszhao/multilingual-faith