An Additive Instance-Wise Approach to Multi-class Model Interpretation

Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. A large number of interpreting methods focus on identifying explanatory input features, which generally fall into two main categories: attribution and selection. A popular attribution-based approach is to exploit local neighborhoods for learning instance-specific explainers in an additive manner. The process is thus inefficient and susceptible to poorly-conditioned samples. Meanwhile, many selection-based methods directly optimize local feature distributions in an instance-wise training framework, thereby being capable of leveraging global information from other inputs. However, they can only interpret single-class predictions and many suffer from inconsistency across different settings, due to a strict reliance on a pre-defined number of features selected. This work exploits the strengths of both methods and proposes a framework for learning local explanations simultaneously for multiple target classes. Our model explainer significantly outperforms additive and instance-wise counterparts on faithfulness with more compact and comprehensible explanations. We also demonstrate the capacity to select stable and important features through extensive experiments on various data sets and black-box model architectures.

翻译：可解释机器学习能够揭示黑盒系统做出特定预测背后的驱动因素。众多解释方法聚焦于识别具有解释性的输入特征，主要归为两类：归因法（attribution）与选择法（selection）。基于归因的主流方法通过利用局部邻域以加性方式学习实例专属解释器，但该过程效率低下且易受病态样本影响。与此同时，许多基于选择的方法在实例级训练框架中直接优化局部特征分布，从而能够利用其他输入的全局信息。然而，这些方法仅能解释单类别预测，且由于严格依赖预选特征数量，常在不同设置间存在不一致性问题。本研究融合两类方法的优势，提出一种可同时学习多类别局部解释的框架。我们提出的模型解释器在忠实度、解释紧凑性与可理解性方面显著优于加性及实例级方法。通过在多种数据集与黑盒模型架构上的大量实验，我们进一步展示了该方法选择稳定且重要特征的能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/