Integrated Gradients (IG), a widely used axiomatic path-based attribution method, assigns importance scores to input features by integrating model gradients along a straight path from a baseline to the input. While effective in some cases, we show that straight paths can lead to flawed attributions. In this paper, we identify the cause of these misattributions and propose an alternative approach that equips the input space with a model-induced Riemannian metric (derived from the explained model's Jacobian) and computes attributions by integrating gradients along geodesics under this metric. We call this method Geodesic Integrated Gradients (GIG). To approximate geodesic paths, we introduce two techniques: a k-Nearest Neighbours-based approach for smaller models and a Stochastic Variational Inference-based method for larger ones. Additionally, we propose a new axiom, No-Cancellation Completeness (NCC), which strengthens completeness by ruling out feature-wise cancellation. We prove that, for path-based attributions under the model-induced metric, NCC holds if and only if the integration path is a geodesic. Through experiments on both synthetic and real-world image classification data, we provide empirical evidence supporting our theoretical analysis and showing that GIG produces more faithful attributions than existing methods, including IG, on the benchmarks considered.
翻译:集成梯度(Integrated Gradients, IG)作为一种广泛使用的基于路径的公理化归因方法,通过沿从基线到输入的直线路径积分模型梯度,为输入特征分配重要性分数。尽管在某些情况下有效,我们发现直线路径可能导致错误的归因结果。本文揭示了这些错误归因的成因,并提出一种替代方法:该方法通过引入模型诱导的黎曼度量(源自被解释模型的雅可比矩阵)来装备输入空间,并在此度量下沿测地线积分梯度以计算归因。我们将此方法称为测地线集成梯度(Geodesic Integrated Gradients, GIG)。为近似测地线路径,我们提出了两种技术:针对较小模型的k近邻方法,以及针对较大模型的随机变分推断方法。此外,我们提出了一条新公理——无抵消完备性(No-Cancellation Completeness, NCC),该公理通过排除特征间抵消效应强化了完备性要求。我们证明,在模型诱导度量下,基于路径的归因方法满足NCC当且仅当其积分路径为测地线。通过在合成数据与真实世界图像分类数据上的实验,我们提供了支持理论分析的实证证据,表明在所用基准测试中,GIG相比现有方法(包括IG)能产生更可靠的归因结果。