Counterfactual explanations (CEs) based on concepts are explanations that consider alternative scenarios to understand which high-level semantic features contributed to particular model predictions. In this work, we propose CEs based on the semantic graphs accompanying input data to achieve more descriptive, accurate, and human-aligned explanations. Building upon state-of-the-art (SoTA) conceptual attempts, we adopt a model-agnostic edit-based approach and introduce leveraging GNNs for efficient Graph Edit Distance (GED) computation. With a focus on the visual domain, we represent images as scene graphs and obtain their GNN embeddings to bypass solving the NP-hard graph similarity problem for all input pairs, an integral part of the CE computation process. We apply our method to benchmark and real-world datasets with varying difficulty and availability of semantic annotations. Testing on diverse classifiers, we find that our CEs outperform previous SoTA explanation models based on semantics, including both white and black-box as well as conceptual and pixel-level approaches. Their superiority is proven quantitatively and qualitatively, as validated by human subjects, highlighting the significance of leveraging semantic edges in the presence of intricate relationships. Our model-agnostic graph-based approach is widely applicable and easily extensible, producing actionable explanations across different contexts.
翻译:基于概念的反事实解释(CEs)通过考虑替代场景来理解哪些高层语义特征影响了特定模型预测。本文提出基于输入数据伴随语义图的CEs,旨在实现更描述性、精准且符合人类认知的解释。在现有最先进概念方法基础上,我们采用模型无关的编辑驱动策略,引入图神经网络高效计算图编辑距离。聚焦视觉领域,我们将图像表示为场景图并获取其GNN嵌入,从而避免对所有输入对求解NP难的图相似度问题(这是CE计算的核心环节)。我们在难度和语义标注可用性各异的基准与真实数据集上验证该方法。通过测试多种分类器,发现我们的CEs在语义层面(包括白盒与黑盒方法、概念级与像素级方案)全面超越先前的SoTA解释模型。其优越性经定量与定性验证,且通过人类受试者评估确认,凸显了利用语义边在复杂关系场景中的重要性。这种模型无关的图方法具有强适用性与易扩展性,可在不同场景中生成可操作的解释。