As larger deep learning models are hard to interpret, there has been a recent focus on generating explanations of these black-box models. In contrast, we may have apriori explanations of how models should behave. In this paper, we formalize this notion as learning from explanation constraints and provide a learning theoretic framework to analyze how such explanations can improve the learning of our models. One may naturally ask, "When would these explanations be helpful?" Our first key contribution addresses this question via a class of models that satisfies these explanation constraints in expectation over new data. We provide a characterization of the benefits of these models (in terms of the reduction of their Rademacher complexities) for a canonical class of explanations given by gradient information in the settings of both linear models and two layer neural networks. In addition, we provide an algorithmic solution for our framework, via a variational approximation that achieves better performance and satisfies these constraints more frequently, when compared to simpler augmented Lagrangian methods to incorporate these explanations. We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
翻译:随着大型深度学习模型难以解释,近年来研究重点转向为这些黑箱模型生成解释。与之相对,我们可能预先知道模型应当如何表现的行为解释。本文将此概念形式化为基于解释约束的学习,并提供一个学习理论框架来分析此类解释如何促进模型学习。人们自然会问:"这些解释何时会有助益?"我们的首个核心贡献通过一类在新数据期望上满足这些解释约束的模型回答了该问题。我们针对梯度信息这一典型解释类别,在线性模型和双层神经网络两种设定下,刻画了这类模型(在拉德马赫复杂度降低方面)的收益。此外,我们为所提框架提供了算法解决方案,通过变分近似方法实现更优性能,并在满足约束频率上超越用于引入这些解释的简单增广拉格朗日方法。大量合成实验和真实场景实验验证了我们方法的优势。