Sparse linear models are a gold standard tool for interpretable machine learning, a field of emerging importance as predictive models permeate decision-making in many domains. Unfortunately, sparse linear models are far less flexible as functions of their input features than black-box models like deep neural networks. With this capability gap in mind, we study a not-uncommon situation where the input features dichotomize into two groups: explanatory features, which are candidates for inclusion as variables in an interpretable model, and contextual features, which select from the candidate variables and determine their effects. This dichotomy leads us to the contextual lasso, a new statistical estimator that fits a sparse linear model to the explanatory features such that the sparsity pattern and coefficients vary as a function of the contextual features. The fitting process learns this function nonparametrically via a deep neural network. To attain sparse coefficients, we train the network with a novel lasso regularizer in the form of a projection layer that maps the network's output onto the space of $\ell_1$-constrained linear models. An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso without sacrificing the predictive power of a standard deep neural network.
翻译:稀疏线性模型是可解释机器学习领域的黄金标准工具,随着预测模型渗透到多个领域的决策中,该领域的重要性日益凸显。然而,稀疏线性模型作为输入特征函数的能力远不如深度神经网络等黑箱模型灵活。基于这一能力差距,我们研究了一种并不罕见的情况:输入特征分为两组——解释性特征(可作为可解释模型中变量的候选)和上下文特征(从候选变量中选择并决定其效应)。这种二分法引出了上下文套索,这是一种新的统计估计器,它通过上下文特征对解释性特征拟合稀疏线性模型,使得稀疏模式和系数随上下文特征变化。拟合过程通过深度神经网络非参数地学习该函数。为获得稀疏系数,我们采用一种新型套索正则化器训练网络,该正则化器以投影层的形式将网络输出映射到$\ell_1$约束线性模型的空间。在真实和合成数据上的大量实验表明,所学模型(仍保持高度透明性)可比常规套索更稀疏,同时不牺牲标准深度神经网络的预测能力。