Existing explanation tools for image classifiers usually give only a single explanation for an image's classification. For many images, however, both humans and image classifiers accept more than one explanation for the image label. Thus, restricting the number of explanations to just one is arbitrary and severely limits the insight into the behavior of the classifier. In this paper, we describe an algorithm and a tool, MultiReX, for computing multiple explanations of the output of a black-box image classifier for a given image. Our algorithm uses a principled approach based on causal theory. We analyse its theoretical complexity and provide experimental results showing that MultiReX finds multiple explanations on 96% of the images in the ImageNet-mini benchmark, whereas previous work finds multiple explanations only on 11%.
翻译:现有图像分类器解释工具通常只为图像分类提供单一解释。然而,对于许多图像而言,人类和图像分类器均可接受多于一种的图像标签解释。因此,将解释数量限制为一种具有任意性,且严重限制了对分类器行为的洞察。本文描述了一种算法及工具MultiReX,用于针对给定图像计算黑盒图像分类器输出的多重解释。该算法采用基于因果理论的原则性方法。我们分析了其理论复杂度,并提供实验结果表明:MultiReX在ImageNet-mini基准测试中为96%的图像找到了多重解释,而先前的工作仅在11%的图像上实现多重解释。