The Neglected Baseline in Model Interpretation

We observe that existing model interpretation methods generally ignore the baseline, and such neglect often results in imprecise or even incorrect interpretation. In this paper, we reformulate the task of model interpretation and the interpretation principles for model interpretation results to demonstrate the importance of the baseline. We further unify gradient-based methods, Integrated Gradients (IG) methods, and Taylor expansion, clarifying the connections among them and explicitly identifying the baseline for each method. On this basis, we analyze the flaws and errors in related model interpretation methods (IG, LayerCAM, ODAM, Difference Map). We advocate evaluating the quality of model interpretation results precisely through the attribution error between the attribution result and the attribution target, rather than adopting flawed evaluation methods, such as those based on marginal-effect or the assumption of perfect model performance. We revise IG and develope a model interpretation method with a clear and reasonable baseline, achieving better results. Our method supports model interpretation based on features from any layer. Interpretation based on features from different layers are all reasonable, and the differences among these results reflect varying degrees of feature extraction at different feature extraction stages.

翻译：我们观察到现有的模型解释方法普遍忽视基线，这种忽视往往导致不精确甚至错误的解释。本文重新定义了模型解释任务及解释结果的评价原则，以阐明基线的重要性。我们进一步统一了梯度方法、积分梯度（IG）方法和泰勒展开方法，厘清了三者之间的关联，并明确识别出每种方法的基线。在此基础上，我们分析了相关模型解释方法（IG、LayerCAM、ODAM、Difference Map）的缺陷与错误。我们主张通过归因结果与归因目标之间的归因误差来精确评估模型解释质量，而非采用存在缺陷的评估方法（例如基于边际效应或完美模型性能假设的方法）。我们修正了IG方法，提出了一种具有清晰合理基线的模型解释方法，取得了更优的效果。该方法支持基于任意层特征的模型解释。基于不同层特征的解释结果均具有合理性，且不同结果之间的差异反映了不同特征提取阶段对特征提取程度的影响。

相关内容

模型解释

关注 2

任何机器学习模型的核心都有一个响应函数，它试图映射和解释独立（输入）变量和从属（目标或响应）变量之间的关系和模式。当模型预测或找到我们的见解时，需要做出某些决定和选择。模型解释试图理解和解释响应函数所做出的这些决策，即什么，为什么以及如何。模型解释的关键是透明度，质疑能力以及人类理解模型决策的难易程度。解释性也通常被称为机器学习模型的人类可解释性解释（HII），是人类（包括非机器学习专家）能够理解模型在决策过程中所做出的选择的程度（如何，为什么和什么）。在比较模型时，除了模型性能之外，如果模型的决策比其他模型的决策更容易被人理解，那么模型被认为比其他模型具有更好的可解释性。

可解释人工智能的基础

专知会员服务

32+阅读 · 2025年10月26日

视觉基础模型的可解释性：综述

专知会员服务

26+阅读 · 2025年1月24日

【斯坦福博士论文】基础模型的数据分布视角，321页pdf

专知会员服务

42+阅读 · 2024年7月8日

基础模型视频理解综述

专知会员服务

32+阅读 · 2024年5月8日