Recent advancements in post-hoc and inherently interpretable methods have markedly enhanced the explanations of black box classifier models. These methods operate either through post-analysis or by integrating concept learning during model training. Although being effective in bridging the semantic gap between a model's latent space and human interpretation, these explanation methods only partially reveal the model's decision-making process. The outcome is typically limited to high-level semantics derived from the last feature map. We argue that the explanations lacking insights into the decision processes at low and mid-level features are neither fully faithful nor useful. Addressing this gap, we introduce the Multi-Level Concept Prototypes Classifier (MCPNet), an inherently interpretable model. MCPNet autonomously learns meaningful concept prototypes across multiple feature map levels using Centered Kernel Alignment (CKA) loss and an energy-based weighted PCA mechanism, and it does so without reliance on predefined concept labels. Further, we propose a novel classifier paradigm that learns and aligns multi-level concept prototype distributions for classification purposes via Class-aware Concept Distribution (CCD) loss. Our experiments reveal that our proposed MCPNet while being adaptable to various model architectures, offers comprehensive multi-level explanations while maintaining classification accuracy. Additionally, its concept distribution-based classification approach shows improved generalization capabilities in few-shot classification scenarios.
翻译:近期,事后与内禀可解释方法的发展显著增强了黑箱分类器的可解释性。这些方法或通过事后分析,或在模型训练过程中融入概念学习来运作。尽管在弥合模型隐空间与人类理解之间的语义鸿沟方面卓有成效,但这些解释方法仅部分揭示了模型的决策过程,其输出通常局限于从最后一层特征图中提取的高层语义。我们认为,缺乏对低层与中层特征决策过程洞见的解释既不完整,也无法真正实用。针对这一不足,我们提出多层级概念原型分类器(MCPNet)——一种内禀可解释模型。MCPNet通过中心化核对齐(CKA)损失与基于能量的加权主成分分析机制,自主地从多个特征图层级学习有意义的原型概念,且无需预定义概念标签。此外,我们提出一种新型分类器范式:通过类别感知概念分布(CCD)损失,学习并对齐多层级概念原型分布以完成分类任务。实验表明,我们提出的MCPNet不仅能适配多种模型架构,在保持分类准确性的同时提供全面的多层级解释,而且其基于概念分布的分类方法在少样本分类场景中展现出更优的泛化能力。