DiConStruct: Causal Concept-based Explanations through Black-Box Distillation

from arxiv, Accepted at Conference on Causal Learning and Reasoning (CLeaR 2024, https://www.cclear.cc/2024). To be published at Proceedings of Machine Learning Research (PMLR)

Model interpretability plays a central role in human-AI decision-making systems. Ideally, explanations should be expressed using human-interpretable semantic concepts. Moreover, the causal relations between these concepts should be captured by the explainer to allow for reasoning about the explanations. Lastly, explanation methods should be efficient and not compromise the performance of the predictive task. Despite the rapid advances in AI explainability in recent years, as far as we know to date, no method fulfills these three properties. Indeed, mainstream methods for local concept explainability do not produce causal explanations and incur a trade-off between explainability and prediction performance. We present DiConStruct, an explanation method that is both concept-based and causal, with the goal of creating more interpretable local explanations in the form of structural causal models and concept attributions. Our explainer works as a distillation model to any black-box machine learning model by approximating its predictions while producing the respective explanations. Because of this, DiConStruct generates explanations efficiently while not impacting the black-box prediction task. We validate our method on an image dataset and a tabular dataset, showing that DiConStruct approximates the black-box models with higher fidelity than other concept explainability baselines, while providing explanations that include the causal relations between the concepts.

翻译：模型可解释性在人机决策系统中扮演着核心角色。理想情况下，解释应使用人类可理解的语义概念进行表述。此外，解释器需要捕获这些概念间的因果关系，以便对解释进行逻辑推理。最后，解释方法应具备高效性，且不损害预测任务的性能。尽管近年来AI可解释性研究取得快速进展，但据我们所知，目前尚无方法同时满足这三项特性。事实上，主流局部概念可解释性方法既无法生成因果解释，又在可解释性与预测性能之间存在权衡。本文提出DiConStruct——一种兼具概念基础与因果特性的解释方法，旨在以结构因果模型与概念归因的形式生成更具可解释性的局部解释。我们的解释器通过近似黑盒模型的预测并生成相应解释，作为任何黑盒机器学习模型的蒸馏模型。因此，DiConStruct能在不影响黑盒预测任务的前提下高效生成解释。我们在图像数据集与表格数据集上验证了该方法，结果表明DiConStruct在近似黑盒模型时具有比其他概念可解释性基线更高的保真度，同时能提供包含概念间因果关系的解释。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日