This paper studies interpretability of convolutional networks by means of saliency maps. Most approaches based on Class Activation Maps (CAM) combine information from fully connected layers and gradient through variants of backpropagation. However, it is well understood that gradients are noisy and alternatives like guided backpropagation have been proposed to obtain better visualization at inference. In this work, we present a novel training approach to improve the quality of gradients for interpretability. In particular, we introduce a regularization loss such that the gradient with respect to the input image obtained by standard backpropagation is similar to the gradient obtained by guided backpropagation. We find that the resulting gradient is qualitatively less noisy and improves quantitatively the interpretability properties of different networks, using several interpretability methods.
翻译:本文通过显著性图研究卷积神经网络的可解释性。大多数基于类激活映射(CAM)的方法通过反向传播的变体,将全连接层的信息与梯度相结合。然而,众所周知梯度具有噪声性,因此学术界提出了引导反向传播等替代方法,以在推理阶段获得更好的可视化效果。在本工作中,我们提出了一种新颖的训练方法,旨在提升梯度在可解释性方面的质量。具体而言,我们引入一种正则化损失函数,使得标准反向传播得到的输入图像梯度与引导反向传播得到的梯度相似。实验发现,由此产生的梯度在定性上噪声显著降低,且通过多种可解释性方法的评估,该梯度在定量上提升了不同网络的可解释性性能。