Existing algorithms for explaining the outputs of image classifiers are based on a variety of approaches and produce explanations that frequently lack formal rigour. On the other hand, logic-based explanations are formally and rigorously defined but their computability relies on strict assumptions about the model that do not hold on image classifiers. In this paper, we show that causal explanations, in addition to being formally and rigorously defined, enjoy the same formal properties as logic-based ones, while still lending themselves to black-box algorithms and being a natural fit for image classifiers. We prove formal properties of causal explanations and their equivalence to logic-based explanations. We demonstrate how to subdivide an image into its sufficient and necessary components. We introduce $δ$-complete explanations, which have a minimum confidence threshold and 1-complete causal explanations, explanations that are classified with the same confidence as the original image. We implement our definitions, and our experimental results demonstrate that different models have different patterns of sufficiency, necessity, and completeness. Our algorithms are efficiently computable, taking on average 6s per image on a ResNet model to compute all types of explanations, and are totally black-box, needing no knowledge of the model, no access to model internals, no access to gradient, nor requiring any properties, such as monotonicity, of the model.
翻译:现有的图像分类器输出解释算法基于多种方法,其生成的解释往往缺乏形式严谨性。另一方面,基于逻辑的解释虽具有严格的形式化定义,但其可计算性依赖于对模型的严格假设,这些假设在图像分类器中并不成立。本文证明,因果解释不仅具有形式化与严谨的定义,且具备与逻辑解释相同的形式属性,同时仍适用于黑盒算法,并天然契合图像分类任务。我们证明了因果解释的形式属性及其与逻辑解释的等价性。展示了如何将图像分解为充分与必要成分。我们提出了具有最小置信度阈值的δ-完备解释,以及与原图像具有相同置信度的1-完备因果解释。通过实现我们的定义框架,实验结果表明不同模型在充分性、必要性和完备性方面呈现不同模式。我们的算法计算高效,在ResNet模型上平均每张图像仅需6秒即可计算所有解释类型,且完全基于黑盒方法,无需了解模型内部结构、无需梯度信息、也不要求模型具有单调性等特定属性。