The large success of deep learning based methods in Visual Question Answering (VQA) has concurrently increased the demand for explainable methods. Most methods in Explainable Artificial Intelligence (XAI) focus on generating post-hoc explanations rather than taking an intrinsic approach, the latter characterizing an interpretable model. In this work, we introduce an interpretable approach for graph-based VQA and demonstrate competitive performance on the GQA dataset. This approach bridges the gap between interpretability and performance. Our model is designed to intrinsically produce a subgraph during the question-answering process as its explanation, providing insight into the decision making. To evaluate the quality of these generated subgraphs, we compare them against established post-hoc explainability methods for graph neural networks, and perform a human evaluation. Moreover, we present quantitative metrics that correlate with the evaluations of human assessors, acting as automatic metrics for the generated explanatory subgraphs. Our implementation is available at https://github.com/DigitalPhonetics/Intrinsic-Subgraph-Generation-for-VQA.
翻译:基于深度学习的视觉问答(VQA)方法取得的巨大成功,同时增加了对可解释方法的需求。大多数可解释人工智能(XAI)方法侧重于生成事后解释,而不是采用内在方法,后者是构建可解释模型的特征。在这项工作中,我们引入了一种可解释的图式VQA方法,并在GQA数据集上展示了具有竞争力的性能。该方法弥合了可解释性与性能之间的差距。我们的模型旨在问答过程中内在地生成一个子图作为其解释,从而提供对决策过程的洞察。为评估这些生成子图的质量,我们将它们与已建立的事后图神经网络可解释性方法进行比较,并进行了人工评估。此外,我们提出了与人工评估员评价相关性的定量指标,作为生成解释性子图的自动评估指标。我们的实现可在 https://github.com/DigitalPhonetics/Intrinsic-Subgraph-Generation-for-VQA 获取。