Although standard Convolutional Neural Networks (CNNs) can be mathematically reinterpreted as Self-Explainable Models (SEMs), their built-in prototypes do not on their own accurately represent the data. Replacing the final linear layer with a $k$-means-based classifier addresses this limitation without compromising performance. This work introduces a common formalization of $k$-means-based post-hoc explanations for the classifier, the encoder's final output (B4), and combinations of intermediate feature activations. The latter approach leverages the spatial consistency of convolutional receptive fields to generate concept-based explanation maps, which are supported by gradient-free feature attribution maps. Empirical evaluation with a ResNet34 shows that using shallower, less compressed feature activations, such as those from the last three blocks (B234), results in a trade-off between semantic fidelity and a slight reduction in predictive performance.
翻译:尽管标准卷积神经网络(CNN)可在数学意义上被重新解释为自解释模型(SEM),但其内置原型本身并不能准确表征数据。用基于$k$-均值的分类器替代最终线性层,可在不牺牲性能的前提下解决该局限。本文提出一种统一的规范化框架,用于基于$k$-均值的分类器后验解释、编码器最终输出(B4)以及中间特征激活的组合。后者利用卷积感受野的空间一致性生成基于概念的解释图,该图由无梯度特征归因图支撑。基于ResNet34的实证评估表明,采用较浅且压缩度较低的特征激活(如来自最后三个模块B234的特征)会在语义保真度与预测性能轻微下降之间产生权衡。