Deep neural networks are very successful on many vision tasks, but hard to interpret due to their black box nature. To overcome this, various post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. Evaluating such methods is challenging since no ground truth attributions exist. We thus propose three novel evaluation schemes to more reliably measure the faithfulness of those methods, to make comparisons between them more fair, and to make visual inspection more systematic. To address faithfulness, we propose a novel evaluation setting (DiFull) in which we carefully control which parts of the input can influence the output in order to distinguish possible from impossible attributions. To address fairness, we note that different methods are applied at different layers, which skews any comparison, and so evaluate all methods on the same layers (ML-Att) and discuss how this impacts their performance on quantitative metrics. For more systematic visualizations, we propose a scheme (AggAtt) to qualitatively evaluate the methods on complete datasets. We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models. Finally, we propose a post-processing smoothing step that significantly improves the performance of some attribution methods, and discuss its applicability.
翻译:深度神经网络在许多视觉任务中表现出色,但由于其黑箱特性难以解释。为克服这一局限,研究者提出了多种事后归因方法,用于识别对模型决策最具影响的图像区域。由于缺乏真实归因标签,评估此类方法极具挑战性。为此,我们提出三种新型评估方案:更可靠地衡量这些方法的忠实度,使方法间比较更公平,并使视觉检查更系统化。针对忠实度,我们提出新型评估设置(DiFull),通过精细控制输入中能影响输出的部分,区分可能归因与不可能归因。针对公平性,我们指出不同方法应用于不同网络层会导致比较偏差,因此将所有方法应用于相同层(ML-Att)进行统一评估,并讨论该操作对其量化指标性能的影响。为更系统地进行可视化,我们提出一种方案(AggAtt),以定性方式评估这些方法在完整数据集上的表现。借助这些评估方案,我们系统研究了多种广泛使用的归因方法在不同模型上的优势与不足。最后,我们提出一种后处理平滑步骤,能显著提升部分归因方法的性能,并讨论其适用性。