In this paper, we dive into the reliability concerns of Integrated Gradients (IG), a prevalent feature attribution method for black-box deep learning models. We particularly address two predominant challenges associated with IG: the generation of noisy feature visualizations for vision models and the vulnerability to adversarial attributional attacks. Our approach involves an adaptation of path-based feature attribution, aligning the path of attribution more closely to the intrinsic geometry of the data manifold. Our experiments utilise deep generative models applied to several real-world image datasets. They demonstrate that IG along the geodesics conforms to the curved geometry of the Riemannian data manifold, generating more perceptually intuitive explanations and, subsequently, substantially increasing robustness to targeted attributional attacks.
翻译:本文深入探讨了积分梯度(IG)这一广泛用于黑盒深度学习模型的特征归因方法的可靠性问题。我们特别针对IG的两大主要挑战:视觉模型中产生噪声特征可视化以及易受对抗性归因攻击,提出了一种改进策略。该方法通过对路径式特征归因进行自适应调整,使归因路径更贴合数据流形的内在几何结构。实验采用深度生成模型对多个真实图像数据集进行验证,结果表明:沿测地线路径的积分梯度能够顺应黎曼数据流形的弯曲几何特性,生成更具感知直观性的解释,并显著增强对定向归因攻击的鲁棒性。