Self-explainable deep neural networks are a recent class of models that can output ante-hoc local explanations that are faithful to the model's reasoning, and as such represent a step forward toward filling the gap between expressiveness and interpretability. Self-explainable graph neural networks (GNNs) aim at achieving the same in the context of graph data. This begs the question: do these models fulfill their implicit guarantees in terms of faithfulness? In this extended abstract, we analyze the faithfulness of several self-explainable GNNs using different measures of faithfulness, identify several limitations -- both in the models themselves and in the evaluation metrics -- and outline possible ways forward.
翻译:自解释深度神经网络是一类新型模型,能够输出与模型推理过程保持忠实(faithful)的前瞻性局部解释,这代表了在弥合表达性与可解释性之间差距方面的重要进展。自解释图神经网络(GNNs)旨在图数据领域实现同样的目标。这引出一个关键问题:这些模型在忠实度方面是否履行了其隐含的保证?在本扩展摘要中,我们采用多种忠实度度量方法,分析了若干自解释GNNs的忠实度表现,指出了模型本身及评估指标存在的若干局限性,并展望了可能的改进方向。