Explanations for Convolutional Neural Networks (CNNs) based on relevance of input pixels might be too unspecific to evaluate which and how input features impact model decisions. Especially in complex real-world domains like biomedicine, the presence of specific concepts (e.g., a certain type of cell) and of relations between concepts (e.g., one cell type is next to another) might be discriminative between classes (e.g., different types of tissue). Pixel relevance is not expressive enough to convey this type of information. In consequence, model evaluation is limited and relevant aspects present in the data and influencing the model decisions might be overlooked. This work presents a novel method to explain and evaluate CNN models, which uses a concept- and relation-based explainer (CoReX). It explains the predictive behavior of a model on a set of images by masking (ir-)relevant concepts from the decision-making process and by constraining relations in a learned interpretable surrogate model. We test our approach with several image data sets and CNN architectures. Results show that CoReX explanations are faithful to the CNN model in terms of predictive outcomes. We further demonstrate that CoReX is a suitable tool for evaluating CNNs supporting identification and re-classification of incorrect or ambiguous classifications.
翻译:基于输入像素相关性的卷积神经网络(CNN)解释可能过于笼统,难以评估输入特征如何以及是否影响模型决策。尤其是在生物医学等复杂的现实领域,特定概念(例如某种细胞类型)的存在以及概念之间的关系(例如一种细胞类型与另一种相邻)可能对类别(例如不同组织类型)具有判别性。像素相关性不足以传达此类信息,导致模型评估受限,数据中影响模型决策的相关方面可能被忽视。本文提出一种新颖的方法来解释和评估CNN模型,该方法使用基于概念和关系的解释器(CoReX)。它通过从决策过程中屏蔽(不)相关概念,并在可学习的可解释代理模型中约束关系,来解释模型在一组图像上的预测行为。我们在多个图像数据集和CNN架构上测试了该方法。结果表明,CoReX 解释在预测结果上对CNN模型是忠实的。我们进一步证明,CoReX 是评估CNN的合适工具,有助于识别和重新分类错误或模糊的分类。