Explaining the decision-making process of machine learning models is crucial for ensuring their reliability and fairness. One popular explanation form highlights key input features, such as i) tokens (e.g., Shapley Values and Integrated Gradients), ii) interactions between tokens (e.g., Bivariate Shapley and Attention-based methods), or iii) interactions between spans of the input (e.g., Louvain Span Interactions). However, these explanation types have only been studied in isolation, making it difficult to judge their respective applicability. To bridge this gap, we propose a unified framework that facilitates a direct comparison between highlight and interactive explanations comprised of four diagnostic properties. Through extensive analysis across these three types of input feature explanations--each utilizing three different explanation techniques--across two datasets and two models, we reveal that each explanation type excels in terms of different diagnostic properties. In our experiments, highlight explanations are the most faithful to a model's prediction, and interactive explanations provide better utility for learning to simulate a model's predictions. These insights further highlight the need for future research to develop combined methods that enhance all diagnostic properties.
翻译:解释机器学习模型的决策过程对于确保其可靠性和公平性至关重要。一种流行的解释形式强调关键输入特征,例如:i) 词元(如Shapley值和积分梯度),ii) 词元间的交互作用(如双变量Shapley和基于注意力的方法),或 iii) 输入片段间的交互作用(如Louvain片段交互)。然而,这些解释类型目前仅被孤立研究,难以判断各自的适用性。为弥合这一差距,我们提出了一个统一框架,该框架通过四项诊断特性促进高亮解释与交互解释之间的直接比较。通过对这三种输入特征解释类型(每种类型采用三种不同的解释技术)在两种数据集和两种模型上进行广泛分析,我们发现每种解释类型在不同诊断特性方面各具优势。在我们的实验中,高亮解释对模型预测的忠实度最高,而交互解释在学习模拟模型预测方面提供了更好的效用。这些发现进一步凸显了未来研究需要开发能提升所有诊断特性的组合方法。