Causal Analysis for Robust Interpretability of Neural Networks

Interpreting the inner function of neural networks is crucial for the trustworthy development and deployment of these black-box models. Prior interpretability methods focus on correlation-based measures to attribute model decisions to individual examples. However, these measures are susceptible to noise and spurious correlations encoded in the model during the training phase (e.g., biased inputs, model overfitting, or misspecification). Moreover, this process has proven to result in noisy and unstable attributions that prevent any transparent understanding of the model's behavior. In this paper, we develop a robust interventional-based method grounded by causal analysis to capture cause-effect mechanisms in pre-trained neural networks and their relation to the prediction. Our novel approach relies on path interventions to infer the causal mechanisms within hidden layers and isolate relevant and necessary information (to model prediction), avoiding noisy ones. The result is task-specific causal explanatory graphs that can audit model behavior and express the actual causes underlying its performance. We apply our method to vision models trained on classification tasks. On image classification tasks, we provide extensive quantitative experiments to show that our approach can capture more stable and faithful explanations than standard attribution-based methods. Furthermore, the underlying causal graphs reveal the neural interactions in the model, making it a valuable tool in other applications (e.g., model repair).

翻译：解读神经网络的内部机制对于可信赖地开发和部署这些黑箱模型至关重要。现有可解释性方法主要基于相关性度量，将模型决策归因于单个样本。然而，这些度量容易受到模型训练阶段编码的噪声和虚假相关性（如偏差输入、模型过拟合或设定错误）的影响。此外，这一过程已被证明会产生噪声大且不稳定的归因结果，阻碍对模型行为的透明理解。本文提出了一种基于因果分析的鲁棒干预方法，用于捕捉预训练神经网络中的因果机制及其与预测的关系。我们的新方法依赖路径干预推断隐藏层内的因果机制，分离与模型预测相关且必要的信息，同时避免噪声干扰。最终生成任务特定的因果解释图，能够审计模型行为并揭示其性能背后的真实原因。我们将该方法应用于基于分类任务训练的视觉模型。在图像分类任务上，通过大量定量实验表明，我们的方法相比标准归因方法能捕获更稳定、更忠实的解释。此外，底层因果图揭示了模型中的神经交互作用，使其成为其他应用（如模型修复）中的重要工具。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

因果图，Causal Graphs，52页ppt

专知会员服务

254+阅读 · 2020年4月19日