It is crucial to be able to explain black-box prediction models to use them effectively and safely in practice. Most existing tools for model explanations are associational rather than causal, and we use two paradoxical examples to show that such explanations are generally inadequate. Motivated by the concept of genetic heritability in twin studies, we propose a new notion called counterfactual explainability for black-box prediction models. Counterfactual explainability has three key advantages: (1) it leverages counterfactual outcomes and extends methods for global sensitivity analysis (such as functional analysis of variance and Sobol's indices) to a causal setting; (2) it is defined not only for the totality of a set of input factors but also for their interactions (indeed, it is a probability measure on a whole ``explanation algebra''); (3) it also applies to dependent input factors whose causal relationship can be modeled by a directed acyclic graph, thus incorporating causal mechanisms into the explanation.
翻译:在实际应用中,有效且安全地使用黑盒预测模型,其可解释性至关重要。现有的大多数模型解释工具是关联性而非因果性的,我们通过两个悖论性示例表明,这类解释通常是不充分的。受双生子研究中遗传力概念的启发,我们为黑盒预测模型提出了一种称为反事实可解释性的新概念。反事实可解释性具有三个关键优势:(1)它利用反事实结果,并将全局敏感性分析方法(如函数方差分析和Sobol指数)扩展到因果场景;(2)它不仅针对一组输入因素的整体定义,还针对它们的交互作用(实际上,它是整个“解释代数”上的概率测度);(3)它也适用于依赖的输入因素,其因果关系可通过有向无环图建模,从而将因果机制纳入解释中。