Gradient strikes back: How filtering out high frequencies improves explanations

Recent years have witnessed an explosion in the development of novel prediction-based attribution methods, which have slowly been supplanting older gradient-based methods to explain the decisions of deep neural networks. However, it is still not clear why prediction-based methods outperform gradient-based ones. Here, we start with an empirical observation: these two approaches yield attribution maps with very different power spectra, with gradient-based methods revealing more high-frequency content than prediction-based methods. This observation raises multiple questions: What is the source of this high-frequency information, and does it truly reflect decisions made by the system? Lastly, why would the absence of high-frequency information in prediction-based methods yield better explainability scores along multiple metrics? We analyze the gradient of three representative visual classification models and observe that it contains noisy information emanating from high-frequencies. Furthermore, our analysis reveals that the operations used in Convolutional Neural Networks (CNNs) for downsampling appear to be a significant source of this high-frequency content -- suggesting aliasing as a possible underlying basis. We then apply an optimal low-pass filter for attribution maps and demonstrate that it improves gradient-based attribution methods. We show that (i) removing high-frequency noise yields significant improvements in the explainability scores obtained with gradient-based methods across multiple models -- leading to (ii) a novel ranking of state-of-the-art methods with gradient-based methods at the top. We believe that our results will spur renewed interest in simpler and computationally more efficient gradient-based methods for explainability.

翻译：近年来，基于预测的新兴归因方法蓬勃发展，正逐渐取代传统基于梯度的深度神经网络决策解释方法。然而，预测方法为何优于梯度方法仍不明确。本文从实证观察入手：两种方法生成的归因图谱具有截然不同的功率谱，梯度方法比预测方法包含更多高频成分。这一发现引发多个问题：高频信息的来源是什么？它是否真实反映系统决策？为何预测方法中高频信息的缺失反而能在多种指标上获得更好的可解释性评分？我们分析了三种代表性视觉分类模型的梯度，发现其中混入了源自高频的噪声信息。进一步分析表明，卷积神经网络中用于下采样的操作似乎是高频成分的主要来源——暗示混叠现象可能是其潜在基础。我们为归因图谱设计了最优低通滤波器，证实其能改进基于梯度的归因方法。研究表明：(i)去除高频噪声能显著提升梯度方法在多种模型上的可解释性评分，并(2)催生出以梯度方法领先的全新最优方法排名。我们认为，这一发现将重新激发学术界对更简单、计算效率更高的梯度可解释性方法的兴趣。