Transparency and explainability in image classification are essential for establishing trust in machine learning models and detecting biases and errors. State-of-the-art explainability methods generate saliency maps to show where a specific class is identified, without providing a detailed explanation of the model's decision process. Striving to address such a need, we introduce a post-hoc method that explains the entire feature extraction process of a Convolutional Neural Network. These explanations include a layer-wise representation of the features the model extracts from the input. Such features are represented as saliency maps generated by clustering and merging similar feature maps, to which we associate a weight derived by generalizing Grad-CAM for the proposed methodology. To further enhance these explanations, we include a set of textual labels collected through a gamified crowdsourcing activity and processed using NLP techniques and Sentence-BERT. Finally, we show an approach to generate global explanations by aggregating labels across multiple images.
翻译:图像分类的透明性与可解释性对于建立机器学习模型的信任、检测偏差与错误至关重要。现有可解释方法通过生成显著性图来显示特定类别被识别的位置,但未能提供模型决策过程的详细解释。为解决这一需求,我们提出一种事后解释方法,可完整说明卷积神经网络的特征提取过程。该解释包含模型从输入中提取特征的逐层表示,这些特征以显著性图形式呈现,通过聚类与合并相似特征图生成,并基于广义Grad-CAM为每个特征图赋予对应权重。为增强解释效果,我们通过游戏化众包活动收集文本标签,并运用NLP技术与Sentence-BERT进行处理。最后,我们提出一种通过跨图像聚合标签生成全局解释的方法。