We observe that existing model interpretation methods generally ignore the baseline, and such neglect often results in imprecise or even incorrect interpretation. In this paper, we reformulate the task of model interpretation and the interpretation principles for model interpretation results to demonstrate the importance of the baseline. We further unify gradient-based methods, Integrated Gradients (IG) methods, and Taylor expansion, clarifying the connections among them and explicitly identifying the baseline for each method. On this basis, we analyze the flaws and errors in related model interpretation methods (IG, LayerCAM, ODAM, Difference Map). We advocate evaluating the quality of model interpretation results precisely through the attribution error between the attribution result and the attribution target, rather than adopting flawed evaluation methods, such as those based on marginal-effect or the assumption of perfect model performance. We revise IG and develope a model interpretation method with a clear and reasonable baseline, achieving better results. Our method supports model interpretation based on features from any layer. Interpretation based on features from different layers are all reasonable, and the differences among these results reflect varying degrees of feature extraction at different feature extraction stages.
翻译:我们观察到现有的模型解释方法普遍忽视基线,这种忽视往往导致不精确甚至错误的解释。本文重新定义了模型解释任务及解释结果的评价原则,以阐明基线的重要性。我们进一步统一了梯度方法、积分梯度(IG)方法和泰勒展开方法,厘清了三者之间的关联,并明确识别出每种方法的基线。在此基础上,我们分析了相关模型解释方法(IG、LayerCAM、ODAM、Difference Map)的缺陷与错误。我们主张通过归因结果与归因目标之间的归因误差来精确评估模型解释质量,而非采用存在缺陷的评估方法(例如基于边际效应或完美模型性能假设的方法)。我们修正了IG方法,提出了一种具有清晰合理基线的模型解释方法,取得了更优的效果。该方法支持基于任意层特征的模型解释。基于不同层特征的解释结果均具有合理性,且不同结果之间的差异反映了不同特征提取阶段对特征提取程度的影响。