Despite the state-of-the-art performance of deep convolutional neural networks, they are susceptible to bias and malfunction in unseen situations. The complex computation behind their reasoning is not sufficiently human-understandable to develop trust. External explainer methods have tried to interpret the network decisions in a human-understandable way, but they are accused of fallacies due to their assumptions and simplifications. On the other side, the inherent self-interpretability of models, while being more robust to the mentioned fallacies, cannot be applied to the already trained models. In this work, we propose a new attention-based pooling layer, called Local Attention Pooling (LAP), that accomplishes self-interpretability and the possibility for knowledge injection while improving the model's performance. Moreover, several weakly-supervised knowledge injection methodologies are provided to enhance the process of training. We verified our claims by evaluating several LAP-extended models on three different datasets, including Imagenet. The proposed framework offers more valid human-understandable and more faithful-to-the-model interpretations than the commonly used white-box explainer methods.
翻译:尽管深度卷积神经网络具有最先进的性能,但它们在新出现的场景中容易受到偏差和故障的影响。其推理背后的复杂计算不足以达到人类可理解的程度,从而难以建立信任。外部解释方法试图以人类可理解的方式解释网络决策,但因其假设和简化而被指责存在谬误。另一方面,模型固有的自解释性虽然对这些谬误更具鲁棒性,却无法应用于已训练好的模型。在本工作中,我们提出了一种新的基于注意力的池化层,称为局部注意力池化(LAP),它在提升模型性能的同时实现了自解释性和知识注入的可能性。此外,我们还提供了几种弱监督的知识注入方法来增强训练过程。我们通过在三个不同数据集(包括ImageNet)上评估多个LAP扩展模型来验证我们的主张。与常用的白盒解释方法相比,所提出的框架提供了更有效的人类可理解且对模型更忠实的解释。