Attribution scores indicate the importance of different input parts and can, thus, explain model behaviour. Currently, prompt-based models are gaining popularity, i.a., due to their easier adaptability in low-resource settings. However, the quality of attribution scores extracted from prompt-based models has not been investigated yet. In this work, we address this topic by analyzing attribution scores extracted from prompt-based models w.r.t. plausibility and faithfulness and comparing them with attribution scores extracted from fine-tuned models and large language models. In contrast to previous work, we introduce training size as another dimension into the analysis. We find that using the prompting paradigm (with either encoder-based or decoder-based models) yields more plausible explanations than fine-tuning the models in low-resource settings and Shapley Value Sampling consistently outperforms attention and Integrated Gradients in terms of leading to more plausible and faithful explanations.
翻译:归因分数可指示不同输入部分的重要性,从而解释模型行为。当前,基于提示的模型因其在低资源场景中更易适应而日益流行。然而,从基于提示的模型中提取的归因分数质量尚未得到研究。本文通过分析基于提示模型提取的归因分数在合理性和忠实性方面的表现,并将其与微调模型及大型语言模型提取的归因分数进行对比,探讨了这一课题。与以往研究不同,我们将训练数据规模作为新的分析维度引入。研究发现:在低资源场景下,采用提示范式(无论是基于编码器还是基于解码器的模型)比微调模型能产生更合理的解释;同时,沙普利值采样法在生成更合理且更忠实的解释方面始终优于注意力机制和积分梯度法。