Graph neural networks (GNNs) are popular machine learning models for graphs with many applications across scientific domains. However, GNNs are considered black box models, and it is challenging to understand how the model makes predictions. Game theoric Shapley value approaches are popular explanation methods in other domains but are not well-studied for graphs. Some studies have proposed Shapley value based GNN explanations, yet they have several limitations: they consider limited samples to approximate Shapley values; some mainly focus on small and large coalition sizes, and they are an order of magnitude slower than other explanation methods, making them inapplicable to even moderate-size graphs. In this work, we propose GNNShap, which provides explanations for edges since they provide more natural explanations for graphs and more fine-grained explanations. We overcome the limitations by sampling from all coalition sizes, parallelizing the sampling on GPUs, and speeding up model predictions by batching. GNNShap gives better fidelity scores and faster explanations than baselines on real-world datasets. The code is available at https://github.com/HipGraph/GNNShap.
翻译:图神经网络(GNN)是面向图数据的流行机器学习模型,在科学领域拥有广泛应用。然而,GNN被视为黑箱模型,难以理解其预测机制。博弈论中的沙普利值方法在其他领域是常用的解释方法,但在图数据中尚未得到充分研究。已有研究提出基于沙普利值的GNN解释方法,但存在若干局限:它们采用有限样本近似沙普利值;部分方法主要聚焦于小规模和大规模联盟组合;其运行速度比其他解释方法慢一个数量级,导致无法适用于中等规模图数据。本文提出GNNShap方法,通过边级解释提供更自然的图解释和更细粒度的分析。我们通过从所有联盟规模中采样、在GPU上并行化采样过程以及利用批处理加速模型预测来突破上述局限。在真实数据集上,GNNShap相比基线方法实现了更优的保真度评分和更快的解释速度。代码已开源在https://github.com/HipGraph/GNNShap。