Model-agnostic explanation methods for deep learning models are flexible regarding usability and availability. However, due to the fact that they can only manipulate input to see changes in output, they suffer from weak performance when used with complex model architectures. For models with large inputs as, for instance, in object detection, sampling-based methods like KernelSHAP are inefficient due to many computation-heavy forward passes through the model. In this work, we present a framework for using sampling-based explanation models in a computer vision context by body part relevance assessment for pedestrian detection. Furthermore, we introduce a novel sampling-based method similar to KernelSHAP that shows more robustness for lower sampling sizes and, thus, is more efficient for explainability analyses on large-scale datasets.
翻译:深度学习模型的模型无关解释方法在可用性和易用性方面具有灵活性。然而,由于这类方法只能通过操控输入来观察输出的变化,因此在处理复杂模型架构时表现欠佳。对于具有大输入规模的模型(例如目标检测任务),基于采样的解释方法(如KernelSHAP)因需大量计算密集的前向传播过程而效率低下。本文提出一种框架,通过行人检测中身体部位相关性评估,将基于采样的解释模型应用于计算机视觉场景。此外,我们引入一种类似KernelSHAP的新型采样方法,该方法在低采样规模下具有更强的鲁棒性,从而在大规模数据集的可解释性分析中更高效。