Occlusion-based attribution methods provide an intuitive way to estimate feature importance by perturbing input features and measuring the resulting change in model output. However, their reliability is strongly affected by how feature removal is implemented: externally selected baselines can introduce bias, out-of-distribution samples, and unstable explanations, while in nonlinear models the occlusion of a set of features can also alter the contribution of non-occluded features. We refer to this effect as attribution shift, as the attribution scores of the non-occluded features drift from their initial values. To challenge these major issues that render explanations unstable, we introduce XtrAIn, a training-guided attribution method that transfers the occlusion operation from the input space to the parameter space. Instead of replacing input values with hand-crafted baselines, XtrAIn follows the model's training trajectory and measures how feature-associated parameter updates affect the output logits. We further introduce Xstep, a lightweight approximation for reducing computational cost, and XtrAIn+, a target-focused variant that emphasizes updates aligned with the target class. Experiments on controlled image datasets and PAM50 breast-cancer subtype classification show that the proposed methods produce cleaner and more interpretable attribution patterns than standard attribution baselines. Overall, XtrAIn provides a training-aware perspective on feature attribution and offers a useful diagnostic tool for studying how feature-level evidence is formed during training.
翻译:基于遮挡的归因方法通过扰动输入特征并测量模型输出的变化来估计特征重要性,提供了一种直观的归因方式。然而,其可靠性受特征移除实现方式的影响显著:外部选择的基线会引入偏差、分布外样本及不稳定的解释,而在非线性模型中,遮挡一组特征还可能改变未被遮挡特征的贡献。我们将此效应称为归因偏移,即未被遮挡特征的归因分数偏离其初始值。为解决这些导致解释不稳定的关键问题,我们提出XtrAIn——一种训练引导的归因方法,将遮挡操作从输入空间转移到参数空间。XtrAIn不采用手工设计的基线替换输入值,而是沿着模型训练轨迹,度量与特征相关的参数更新对输出logits的影响。为进一步降低计算成本,我们引入轻量级近似方法Xstep;并提出面向目标变体XtrAIn+,强调与目标类别一致的参数更新。在受控图像数据集及PAM50乳腺癌亚型分类上的实验表明,所提方法能生成比标准归因基线更清晰、更具可解释性的归因模式。总体而言,XtrAIn提供了基于训练感知的特征归因视角,并为研究训练过程中特征级别证据的形成机制提供了有效的诊断工具。