Prototypical parts-based networks are becoming increasingly popular due to their faithful self-explanations. However, their similarity maps are calculated in the penultimate network layer. Therefore, the receptive field of the prototype activation region often depends on parts of the image outside this region, which can lead to misleading interpretations. We name this undesired behavior a spatial explanation misalignment and introduce an interpretability benchmark with a set of dedicated metrics for quantifying this phenomenon. In addition, we propose a method for misalignment compensation and apply it to existing state-of-the-art models. We show the expressiveness of our benchmark and the effectiveness of the proposed compensation methodology through extensive empirical studies.
翻译:基于原型部分的网络因其忠实的自解释能力而日益流行。然而,它们的相似度图是在网络的倒数第二层计算的。因此,原型激活区域的感受野通常依赖于该区域外部的图像部分,这可能导致误导性的解释。我们将这种不期望的行为称为空间解释错位,并引入一个可解释性基准,该基准包含一组专用指标用于量化此现象。此外,我们提出了一种错位补偿方法,并将其应用于现有最先进模型。通过广泛的实证研究,我们展示了所提基准的表达能力以及所提补偿方法的有效性。