The adoption of Deep Neural Networks (DNNs) in critical fields where predictions need to be accompanied by justifications is hindered by their inherent black-box nature. In this paper, we introduce P-TAME (Perturbation-based Trainable Attention Mechanism for Explanations), a model-agnostic method for explaining DNN-based image classifiers. P-TAME employs an auxiliary image classifier to extract features from the input image, bypassing the need to tailor the explanation method to the internal architecture of the backbone classifier being explained. Unlike traditional perturbation-based methods, which have high computational requirements, P-TAME offers an efficient alternative by generating high-resolution explanations in a single forward pass during inference. We apply P-TAME to explain the decisions of VGG-16, ResNet-50, and ViT-B-16, three distinct and widely used image classifiers. Quantitative and qualitative results show that our method matches or outperforms previous explainability methods, including model-specific approaches. Code and trained models will be released upon acceptance.
翻译:深度神经网络(DNN)因其固有的黑箱特性,在需要为预测提供合理依据的关键领域中的应用受到限制。本文提出P-TAME(基于扰动的可训练注意力解释机制),一种与模型无关的DNN图像分类器解释方法。P-TAME通过辅助图像分类器提取输入图像特征,无需针对待解释主干分类器的内部架构定制解释方法。与传统基于扰动的方法相比,P-TAME在推理阶段仅需单次前向传播即可生成高分辨率解释,显著降低了计算需求。我们将P-TAME应用于VGG-16、ResNet-50和ViT-B-16这三种典型且广泛使用的图像分类器的决策解释。定量与定性实验结果表明,本方法在解释性能上达到或超越了现有可解释性方法(包括模型专用方法)。代码与训练模型将在论文录用后开源。