Integrated Gradients (IG), a widely used axiomatic path-based attribution method, assigns importance scores to input features by integrating model gradients along a straight path from a baseline to the input. While effective in some cases, we show that straight paths can lead to flawed attributions. In this paper, we identify the cause of these misattributions and propose an alternative approach that equips the input space with a model-induced Riemannian metric (derived from the explained model's Jacobian) and computes attributions by integrating gradients along geodesics under this metric. We call this method Geodesic Integrated Gradients (GIG). To approximate geodesic paths, we introduce two techniques: a k-Nearest Neighbours-based approach for smaller models and a Stochastic Variational Inference-based method for larger ones. Additionally, we propose a new axiom, No-Cancellation Completeness (NCC), which strengthens completeness by ruling out feature-wise cancellation. We prove that, for path-based attributions under the model-induced metric, NCC holds if and only if the integration path is a geodesic. Through experiments on both synthetic and real-world image classification data, we provide empirical evidence supporting our theoretical analysis and showing that GIG produces more faithful attributions than existing methods, including IG, on the benchmarks considered.
翻译:集成梯度(IG)是一种广泛使用的基于公理化路径的归因方法,它通过沿从基线到输入的直线路径积分模型梯度来为输入特征分配重要性分数。尽管在某些情况下有效,但我们证明直线路径可能导致有缺陷的归因。本文中,我们识别了这些错误归因的成因,并提出一种替代方法:为输入空间配备一个模型诱导的黎曼度量(源自被解释模型的雅可比矩阵),并通过沿该度量下的测地线积分梯度来计算归因。我们将此方法称为测地线集成梯度(GIG)。为近似测地线路径,我们引入了两种技术:针对较小模型的基于k近邻的方法,以及针对较大模型的基于随机变分推断的方法。此外,我们提出了一个新的公理——无抵消完备性(NCC),它通过排除特征间抵消效应来强化完备性。我们证明,对于模型诱导度量下基于路径的归因,当且仅当积分路径为测地线时NCC成立。通过在合成数据和真实世界图像分类数据上的实验,我们提供了支持理论分析的实证证据,并表明在所考虑的基准测试中,GIG比包括IG在内的现有方法能产生更可信的归因。