A wide variety of model explanation approaches have been proposed in recent years, all guided by very different rationales and heuristics. In this paper, we take a new route and cast interpretability as a statistical inference problem. We propose a general deep probabilistic model designed to produce interpretable predictions. The model parameters can be learned via maximum likelihood, and the method can be adapted to any predictor network architecture and any type of prediction problem. Our method is a case of amortized interpretability models, where a neural network is used as a selector to allow for fast interpretation at inference time. Several popular interpretability methods are shown to be particular cases of regularised maximum likelihood for our general model. We propose new datasets with ground truth selection which allow for the evaluation of the features importance map. Using these datasets, we show experimentally that using multiple imputation provides more reasonable interpretations.
翻译:近年来,人们提出了多种模型解释方法,这些方法均基于截然不同的原理和启发式规则。本文另辟蹊径,将可解释性视为一个统计推断问题。我们提出了一种通用的深度概率模型,旨在生成可解释的预测结果。该模型参数可通过最大似然估计法进行学习,且该方法可适配任意预测器网络架构及任意类型的预测问题。我们的方法属于摊销可解释性模型的范畴,其中神经网络被用作选择器,以在推理阶段实现快速解释。研究表明,多种流行的可解释性方法均可视为我们通用模型下正则化最大似然估计的特例。我们提出了带有真实选择标签的新数据集,用于评估特征重要性图。基于这些数据集,我们的实验表明,采用多重插补法能够提供更合理的解释。