Explainability of Deep Neural Networks (DNNs) has been garnering increasing attention in recent years. Of the various explainability approaches, concept-based techniques stand out for their ability to utilize human-meaningful concepts instead of focusing solely on individual pixels. However, there is a scarcity of methods that consistently provide both local and global explanations. Moreover, most of the methods have no offer to explain misclassification cases. Considering these challenges, we present a unified concept-based system for unsupervised learning of both local and global concepts. Our primary objective is to uncover the intrinsic concepts underlying each data category by training surrogate explainer networks to estimate the importance of the concepts. Our experimental results substantiated the efficacy of the discovered concepts through diverse quantitative and qualitative assessments, encompassing faithfulness, completeness, and generality. Furthermore, our approach facilitates the explanation of both accurate and erroneous predictions, rendering it a valuable tool for comprehending the characteristics of the target objects and classes.
翻译:近年来,深度神经网络(DNNs)的可解释性日益受到关注。在众多可解释性方法中,基于概念的技术因其能够利用人类可理解的概念(而非仅关注单个像素)而脱颖而出。然而,目前缺乏能够同时提供局部与全局解释的一致性方法。此外,多数方法无法对误分类情况进行解释。针对这些挑战,我们提出了一种统一的概念基础系统,用于无监督学习局部与全局概念。我们的主要目标是通过训练替代解释器网络来估计概念的重要性,从而揭示每个数据类别所蕴含的内在概念。实验结果通过多样化的定量与定性评估(包括保真度、完备性和泛化性)证实了所发现概念的有效性。此外,我们的方法能够解释正确与错误的预测,使其成为理解目标对象与类别特征的有力工具。