Counterfactuals have been established as a popular explainability technique which leverages a set of minimal edits to alter the prediction of a classifier. When considering conceptual counterfactuals on images, the edits requested should correspond to salient concepts present in the input data. At the same time, conceptual distances are defined by knowledge graphs, ensuring the optimality of conceptual edits. In this work, we extend previous endeavors on graph edits as counterfactual explanations by conducting a comparative study which encompasses both supervised and unsupervised Graph Neural Network (GNN) approaches. To this end, we pose the following significant research question: should we represent input data as graphs, which is the optimal GNN approach in terms of performance and time efficiency to generate minimal and meaningful counterfactual explanations for black-box image classifiers?
翻译:反事实已被确立为一种流行的可解释性技术,它利用一组最小的编辑来改变分类器的预测。当考虑图像上的概念性反事实时,所请求的编辑应对应于输入数据中存在的显著概念。同时,概念距离由知识图谱定义,确保了概念编辑的最优性。在本工作中,我们通过开展一项包含监督和非监督图神经网络方法的比较研究,扩展了先前以图编辑作为反事实解释的探索。为此,我们提出以下重要研究问题:当将输入数据表示为图时,从性能和时间效率角度,哪种图神经网络方法能最优地生成针对黑盒图像分类器最小且有意义的反事实解释?