We introduce Discovering Conceptual Network Explanations (DCNE), a new approach for generating human-comprehensible visual explanations to enhance the interpretability of deep neural image classifiers. Our method automatically finds visual explanations that are critical for discriminating between classes. This is achieved by simultaneously optimizing three criteria: the explanations should be few, diverse, and human-interpretable. Our approach builds on the recently introduced Concept Relevance Propagation (CRP) explainability method. While CRP is effective at describing individual neuronal activations, it generates too many concepts, which impacts human comprehension. Instead, DCNE selects the few most important explanations. We introduce a new evaluation dataset centered on the challenging task of classifying birds, enabling us to compare the alignment of DCNE's explanations to those of human expert-defined ones. Compared to existing eXplainable Artificial Intelligence (XAI) methods, DCNE has a desirable trade-off between conciseness and completeness when summarizing network explanations. It produces 1/30 of CRP's explanations while only resulting in a slight reduction in explanation quality. DCNE represents a step forward in making neural network decisions accessible and interpretable to humans, providing a valuable tool for both researchers and practitioners in XAI and model alignment.
翻译:我们提出了一种名为发现概念网络解释(DCNE)的新方法,用于生成人类可理解的视觉解释,以增强深度神经图像分类器的可解释性。我们的方法能自动找到对于区分类别至关重要的视觉解释。这是通过同时优化三个标准实现的:解释应当数量少、多样化且易于人类理解。我们的方法建立在最近引入的概念相关性传播(CRP)可解释性方法之上。虽然CRP在描述单个神经元激活方面很有效,但它会产生过多的概念,这影响了人类的认知。相反,DCNE会选择少数最重要的解释。我们引入了一个专注于鸟类分类这一挑战性任务的新评估数据集,使我们能够比较DCNE的解释与人类专家定义的解释之间的一致性。与现有的可解释人工智能(XAI)方法相比,DCNE在总结网络解释时,在简洁性和完整性之间取得了理想的平衡。它产生的解释数量仅为CRP的1/30,而解释质量仅略有下降。DCNE代表了在使神经网络决策对人类而言易于理解和解释方面向前迈进了一步,为XAI和模型对齐领域的研究人员和实践者提供了一个有价值的工具。