Faithful and Consistent Graph Neural Network Explanations with Rationale Alignment

Uncovering rationales behind predictions of graph neural networks (GNNs) has received increasing attention over recent years. Instance-level GNN explanation aims to discover critical input elements, like nodes or edges, that the target GNN relies upon for making predictions. %These identified sub-structures can provide interpretations of GNN's behavior. Though various algorithms are proposed, most of them formalize this task by searching the minimal subgraph which can preserve original predictions. However, an inductive bias is deep-rooted in this framework: several subgraphs can result in the same or similar outputs as the original graphs. Consequently, they have the danger of providing spurious explanations and failing to provide consistent explanations. Applying them to explain weakly-performed GNNs would further amplify these issues. To address this problem, we theoretically examine the predictions of GNNs from the causality perspective. Two typical reasons for spurious explanations are identified: confounding effect of latent variables like distribution shift, and causal factors distinct from the original input. Observing that both confounding effects and diverse causal rationales are encoded in internal representations, \tianxiang{we propose a new explanation framework with an auxiliary alignment loss, which is theoretically proven to be optimizing a more faithful explanation objective intrinsically. Concretely for this alignment loss, a set of different perspectives are explored: anchor-based alignment, distributional alignment based on Gaussian mixture models, mutual-information-based alignment, etc. A comprehensive study is conducted both on the effectiveness of this new framework in terms of explanation faithfulness/consistency and on the advantages of these variants.

翻译：揭示图神经网络（GNNs）预测背后的原理近年来受到越来越多的关注。实例级GNN解释旨在发现目标GNN进行预测时所依赖的关键输入元素，如节点或边。尽管已有多种算法被提出，但大多数算法通过搜索能够保留原始预测的最小子图来形式化这一任务。然而，这一框架根深蒂固地存在一个归纳偏差：多个子图可能产生与原始图相同或相似的输出。因此，这些方法存在提供虚假解释的风险，且无法提供一致的解释。当将其用于解释性能较弱的GNN时，这些问题会进一步加剧。为解决此问题，我们从因果角度对GNN的预测进行理论分析，识别出虚假解释的两个典型原因：潜在变量（如分布偏移）的混杂效应，以及不同于原始输入的因果因素。观察到混杂效应和多样化的因果原理都编码在内部表示中，我们提出了一种新的解释框架，并引入辅助对齐损失，该损失在理论上被证明能够本质上优化更忠实的解释目标。具体针对该对齐损失，我们探索了多种不同视角：基于锚点的对齐、基于高斯混合模型的分布对齐、基于互信息的对齐等。我们对该新框架在解释忠实性/一致性方面的有效性，以及这些变体的优势进行了全面研究。