Existing algorithms for explaining the output of image classifiers use different definitions of explanations and a variety of techniques to extract them. However, none of the existing tools use a principled approach based on formal definitions of causes and explanations for the explanation extraction. In this paper we present a novel black-box approach to computing explanations grounded in the theory of actual causality. We prove relevant theoretical results and present an algorithm for computing approximate explanations based on these definitions. We prove termination of our algorithm and discuss its complexity and the amount of approximation compared to the precise definition. We implemented the framework in a tool rex and we present experimental results and a comparison with state-of-the-art tools. We demonstrate that rex is the most efficient tool and produces the smallest explanations, in addition to outperforming other black-box tools on standard quality measures.
翻译:现有解释图像分类器输出的算法采用不同的解释定义及多种提取技术。然而,现有工具均未基于形式化的因果与解释定义采用原则性方法进行解释提取。本文提出一种基于实际因果关系理论的新型黑盒解释计算方法。我们证明了相关理论结果,并提出基于这些定义计算近似解释的算法。我们证明了算法的终止性,并讨论了其复杂度以及与精确定义相比的近似程度。我们在工具rex中实现了该框架,展示了实验结果并与最先进工具进行了比较。实验证明,rex在标准质量指标上不仅优于其他黑盒工具,还是最高效的工具,并能生成最简洁的解释。