Deep learning models are widely used nowadays for their reliability in performing various tasks. However, they do not typically provide the reasoning behind their decision, which is a significant drawback, particularly for more sensitive areas such as biometrics, security and healthcare. The most commonly used approaches to provide interpretability create visual attention heatmaps of regions of interest on an image based on models gradient backpropagation. Although this is a viable approach, current methods are targeted toward image settings and default/standard deep learning models, meaning that they require significant adaptations to work on video/multi-modal settings and custom architectures. This paper proposes an approach for interpretability that is model-agnostic, based on a novel use of the Squeeze and Excitation (SE) block that creates visual attention heatmaps. By including an SE block prior to the classification layer of any model, we are able to retrieve the most influential features via SE vector manipulation, one of the key components of the SE block. Our results show that this new SE-based interpretability can be applied to various models in image and video/multi-modal settings, namely biometrics of facial features with CelebA and behavioral biometrics using Active Speaker Detection datasets. Furthermore, our proposal does not compromise model performance toward the original task, and has competitive results with current interpretability approaches in state-of-the-art object datasets, highlighting its robustness to perform in varying data aside from the biometric context.
翻译:深度学习模型因其在执行各类任务中的可靠性而被广泛使用。然而,它们通常不提供决策背后的推理过程,这是一个显著的缺点,尤其是在生物识别、安全和医疗保健等更敏感的领域。提供可解释性最常用的方法基于模型梯度反向传播,创建图像上感兴趣区域的视觉注意力热图。尽管这是一种可行的方法,但现有方法主要针对图像设置和默认/标准深度学习模型,这意味着它们需要大量调整才能适用于视频/多模态设置和自定义架构。本文提出了一种与模型无关的可解释性方法,该方法基于对Squeeze and Excitation(SE)模块的新颖运用来创建视觉注意力热图。通过在任意模型的分类层之前加入SE模块,我们能够通过SE向量操作(SE模块的关键组成部分之一)检索最具影响力的特征。我们的结果表明,这种基于SE的新可解释性方法可应用于图像和视频/多模态设置中的各种模型,即使用CelebA数据集的面部特征生物识别以及使用Active Speaker Detection数据集的行为生物识别。此外,我们的方案不会损害模型在原始任务上的性能,并且在最先进的物体数据集中与当前可解释性方法相比具有竞争力的结果,突显了其在生物识别背景之外的不同数据中执行的鲁棒性。