Similarity search is a fundamental task for exploiting information in various applications dealing with graph data, such as citation networks or knowledge graphs. While this task has been intensively approached from heuristics to graph embeddings and graph neural networks (GNNs), providing explanations for similarity has received less attention. In this work we are concerned with explainable similarity search over graphs, by investigating how GNN-based methods for computing node similarities can be augmented with explanations. Specifically, we evaluate the performance of two prominent approaches towards explanations in GNNs, based on the concepts of mutual information (MI), and gradient-based explanations (GB). We discuss their suitability and empirically validate the properties of their explanations over different popular graph benchmarks. We find that unlike MI explanations, gradient-based explanations have three desirable properties. First, they are actionable: selecting inputs depending on them results in predictable changes in similarity scores. Second, they are consistent: the effect of selecting certain inputs overlaps very little with the effect of discarding them. Third, they can be pruned significantly to obtain sparse explanations that retain the effect on similarity scores.
翻译:相似性搜索是利用图数据(如引文网络或知识图谱)中信息的基础任务。尽管该任务已从启发式方法、图嵌入到图神经网络(GNN)被广泛研究,但对相似性提供解释的关注较少。本研究关注图上的可解释相似性搜索,探讨如何为基于GNN的节点相似性计算方法增强解释机制。具体而言,我们评估了两种主流的GNN解释方法——基于互信息(MI)和基于梯度(GB)的解释方法——的性能。我们讨论了它们的适用性,并通过不同主流图基准数据集实证验证了其解释特性。研究发现,与MI解释不同,基于梯度的解释具有三个理想特性:首先,它们具有可操作性——依据这些解释选择输入会导致相似性分数的可预测变化;其次,它们具有一致性——选择特定输入产生的影响与排除这些输入的影响重叠度极低;第三,它们可通过显著剪枝获得稀疏解释,同时保持对相似性分数的影响效果。