Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat

ML model design either starts with an interpretable model or a Blackbox and explains it post hoc. Blackbox models are flexible but difficult to explain, while interpretable models are inherently explainable. Yet, interpretable models require extensive ML knowledge and tend to be less flexible and underperforming than their Blackbox variants. This paper aims to blur the distinction between a post hoc explanation of a Blackbox and constructing interpretable models. Beginning with a Blackbox, we iteratively carve out a mixture of interpretable experts (MoIE) and a residual network. Each interpretable model specializes in a subset of samples and explains them using First Order Logic (FOL), providing basic reasoning on concepts from the Blackbox. We route the remaining samples through a flexible residual. We repeat the method on the residual network until all the interpretable models explain the desired proportion of data. Our extensive experiments show that our route, interpret, and repeat approach (1) identifies a diverse set of instance-specific concepts with high concept completeness via MoIE without compromising in performance, (2) identifies the relatively ``harder'' samples to explain via residuals, (3) outperforms the interpretable by-design models by significant margins during test-time interventions, and (4) fixes the shortcut learned by the original Blackbox. The code for MoIE is publicly available at: https://github.com/batmanlab/ICML-2023-Route-interpret-repeat.

翻译：ML模型设计要么从可解释模型入手，要么以黑盒模型为基础进行事后解释。黑盒模型灵活但难以解释，而可解释模型天然具备可解释性。然而，可解释模型需要大量ML专业知识，且往往灵活性较低、性能逊于其黑盒变体。本文旨在模糊黑盒事后解释与构建可解释模型之间的界限。从黑盒出发，我们迭代式地剥离出可解释专家混合体（MoIE）与残差网络。每个可解释模型专门处理样本子集，并利用一阶逻辑（FOL）进行解释，从而在黑盒基础上提供基本概念推理。剩余样本通过灵活的残差网络路由处理。我们在残差网络上重复该方法，直至所有可解释模型覆盖目标比例的数据。大量实验表明，我们的路由-解释-重复方法：（1）通过MoIE在性能无损的前提下识别出具有高概念完整性的多样化实例特定概念；（2）通过残差识别出相对“更难”解释的样本；（3）在测试时干预任务中显著优于设计可解释模型；（4）修复了原始黑盒学习到的捷径。MoIE代码已开源：https://github.com/batmanlab/ICML-2023-Route-interpret-repeat。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！华盛顿大学最新《可解释人工智能》课程，系统讲述XAI最新进展

专知会员服务

70+阅读 · 2022年9月14日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日