Gradient-based saliency maps are widely used to explain deep neural network decisions. However, as models become deeper and more black-box, such as in closed-source APIs like ChatGPT, computing gradients become challenging, hindering conventional explanation methods. In this work, we introduce a novel unified framework for estimating gradients in black-box settings and generating saliency maps to interpret model decisions. We employ the likelihood ratio method to estimate output-to-input gradients and utilize them for saliency map generation. Additionally, we propose blockwise computation techniques to enhance estimation accuracy. Extensive experiments in black-box settings validate the effectiveness of our method, demonstrating accurate gradient estimation and explainability of generated saliency maps. Furthermore, we showcase the scalability of our approach by applying it to explain GPT-Vision, revealing the continued relevance of gradient-based explanation methods in the era of large, closed-source, and black-box models.
翻译:基于梯度的显著性图被广泛用于解释深度神经网络的决策。然而,随着模型变得更深且更趋近于黑盒(例如ChatGPT等闭源API),计算梯度变得具有挑战性,阻碍了传统的解释方法。本文提出了一种新颖的统一框架,用于在黑盒设置下估计梯度并生成显著性图以解释模型决策。我们采用似然比方法来估计输出对输入的梯度,并将其用于显著性图生成。此外,我们提出了分块计算技术以提高估计精度。在黑盒设置下的大量实验验证了我们方法的有效性,证明了梯度估计的准确性以及所生成显著性图的可解释性。进一步地,我们通过将方法应用于解释GPT-Vision,展示了其可扩展性,揭示了基于梯度的解释方法在大型、闭源、黑盒模型时代持续的相关性。