As larger deep learning models are hard to interpret, there has been a recent focus on generating explanations of these black-box models. In contrast, we may have apriori explanations of how models should behave. In this paper, we formalize this notion as learning from explanation constraints and provide a learning theoretic framework to analyze how such explanations can improve the learning of our models. One may naturally ask, "When would these explanations be helpful?" Our first key contribution addresses this question via a class of models that satisfies these explanation constraints in expectation over new data. We provide a characterization of the benefits of these models (in terms of the reduction of their Rademacher complexities) for a canonical class of explanations given by gradient information in the settings of both linear models and two layer neural networks. In addition, we provide an algorithmic solution for our framework, via a variational approximation that achieves better performance and satisfies these constraints more frequently, when compared to simpler augmented Lagrangian methods to incorporate these explanations. We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
翻译:随着大型深度学习模型难以解释,近期研究聚焦于为这些黑箱模型生成解释。相反,我们可能预先具备模型应如何行为的解释。本文将这些概念形式化为基于解释约束的学习,并提供学习理论框架以分析此类解释如何改善模型学习。一个自然的问题是:"这些解释何时会有帮助?" 我们的首要贡献通过一类在新数据期望上满足这些解释约束的模型来回应此问题。我们针对线性模型与双层神经网络中基于梯度信息的典型解释类别,刻画了这些模型的优势(以Rademacher复杂度降低为指标)。此外,我们提出了一种算法解决方案:通过变分近似方法,相较于整合这些解释的简单增广拉格朗日方法,该方法能实现更优性能且更频繁地满足约束。我们在大量合成实验与真实世界实验中验证了该方法的优越性。