Learning Hybrid Interpretable Models: Theory, Taxonomy, and Methods

A hybrid model involves the cooperation of an interpretable model and a complex black box. At inference, any input of the hybrid model is assigned to either its interpretable or complex component based on a gating mechanism. The advantages of such models over classical ones are two-fold: 1) They grant users precise control over the level of transparency of the system and 2) They can potentially perform better than a standalone black box since redirecting some of the inputs to an interpretable model implicitly acts as regularization. Still, despite their high potential, hybrid models remain under-studied in the interpretability/explainability literature. In this paper, we remedy this fact by presenting a thorough investigation of such models from three perspectives: Theory, Taxonomy, and Methods. First, we explore the theory behind the generalization of hybrid models from the Probably-Approximately-Correct (PAC) perspective. A consequence of our PAC guarantee is the existence of a sweet spot for the optimal transparency of the system. When such a sweet spot is attained, a hybrid model can potentially perform better than a standalone black box. Secondly, we provide a general taxonomy for the different ways of training hybrid models: the Post-Black-Box and Pre-Black-Box paradigms. These approaches differ in the order in which the interpretable and complex components are trained. We show where the state-of-the-art hybrid models Hybrid-Rule-Set and Companion-Rule-List fall in this taxonomy. Thirdly, we implement the two paradigms in a single method: HybridCORELS, which extends the CORELS algorithm to hybrid modeling. By leveraging CORELS, HybridCORELS provides a certificate of optimality of its interpretable component and precise control over transparency. We finally show empirically that HybridCORELS is competitive with existing hybrid models, and performs just as well as a standalone black box (or even better) while being partly transparent.

翻译：混合模型涉及可解释模型与复杂黑箱模型的协作。在推理过程中，混合模型的任何输入均基于门控机制被分配至其可解释组件或复杂组件。相较于经典模型，此类模型具有双重优势：1）赋予用户对系统透明度的精确控制；2）通过将部分输入重定向至可解释模型，隐式地起到正则化作用，从而可能比独立黑箱模型表现更优。然而，尽管潜力巨大，混合模型在可解释性/可解释性文献中仍研究不足。本文通过从理论、分类与方法三个视角对此类模型进行深入探究来弥补这一不足。首先，我们从可能近似正确（PAC）角度探索混合模型泛化背后的理论。我们的PAC保证引申出系统最优透明度的”最佳点”的存在性。当达到该最佳点时，混合模型可能比独立黑箱模型表现更优。其次，我们为训练混合模型的不同方式提供通用分类：后黑箱范式与前黑箱范式。这两种方法在可解释组件与复杂组件的训练顺序上有所区别。我们展示了当前最先进的混合模型Hybrid-Rule-Set与Companion-Rule-List在该分类中的归属。第三，我们将两种范式整合至单一方法HybridCORELS中，该方法将CORELS算法扩展至混合建模。通过利用CORELS，HybridCORELS为其可解释组件提供最优性认证，并对透明度实现精确控制。最终，我们通过实验证明HybridCORELS与现有混合模型具有竞争力，并在保持部分透明性的同时，表现与独立黑箱模型相当（甚至更优）。