ML model design either starts with an interpretable model or a Blackbox and explains it post hoc. Blackbox models are flexible but difficult to explain, while interpretable models are inherently explainable. Yet, interpretable models require extensive ML knowledge and tend to be less flexible and underperforming than their Blackbox variants. This paper aims to blur the distinction between a post hoc explanation of a Blackbox and constructing interpretable models. Beginning with a Blackbox, we iteratively carve out a mixture of interpretable experts (MoIE) and a residual network. Each interpretable model specializes in a subset of samples and explains them using First Order Logic (FOL), providing basic reasoning on concepts from the Blackbox. We route the remaining samples through a flexible residual. We repeat the method on the residual network until all the interpretable models explain the desired proportion of data. Our extensive experiments show that our route, interpret, and repeat approach (1) identifies a diverse set of instance-specific concepts with high concept completeness via MoIE without compromising in performance, (2) identifies the relatively ``harder'' samples to explain via residuals, (3) outperforms the interpretable by-design models by significant margins during test-time interventions, and (4) fixes the shortcut learned by the original Blackbox. The code for MoIE is publicly available at: https://github.com/batmanlab/ICML-2023-Route-interpret-repeat.
翻译:ML模型设计要么从可解释模型入手,要么以黑盒模型为基础进行事后解释。黑盒模型灵活但难以解释,而可解释模型天然具备可解释性。然而,可解释模型需要大量ML专业知识,且往往灵活性较低、性能逊于其黑盒变体。本文旨在模糊黑盒事后解释与构建可解释模型之间的界限。从黑盒出发,我们迭代式地剥离出可解释专家混合体(MoIE)与残差网络。每个可解释模型专门处理样本子集,并利用一阶逻辑(FOL)进行解释,从而在黑盒基础上提供基本概念推理。剩余样本通过灵活的残差网络路由处理。我们在残差网络上重复该方法,直至所有可解释模型覆盖目标比例的数据。大量实验表明,我们的路由-解释-重复方法:(1)通过MoIE在性能无损的前提下识别出具有高概念完整性的多样化实例特定概念;(2)通过残差识别出相对“更难”解释的样本;(3)在测试时干预任务中显著优于设计可解释模型;(4)修复了原始黑盒学习到的捷径。MoIE代码已开源:https://github.com/batmanlab/ICML-2023-Route-interpret-repeat。