We propose to minimize a generic differentiable objective with $L_1$ constraint using a simple reparametrization and straightforward stochastic gradient descent. Our proposal is the direct generalization of previous ideas that the $L_1$ penalty may be equivalent to a differentiable reparametrization with weight decay. We prove that the proposed method, \textit{spred}, is an exact differentiable solver of $L_1$ and that the reparametrization trick is completely ``benign" for a generic nonconvex function. Practically, we demonstrate the usefulness of the method in (1) training sparse neural networks to perform gene selection tasks, which involves finding relevant features in a very high dimensional space, and (2) neural network compression task, to which previous attempts at applying the $L_1$-penalty have been unsuccessful. Conceptually, our result bridges the gap between the sparsity in deep learning and conventional statistical learning.
翻译:我们提出通过简单的重参数化和直接的随机梯度下降法,来最小化带有$L_1$约束的通用可微目标函数。该方法直接推广了先前关于$L_1$惩罚项等价于带权重衰减的可微重参数化的思想。我们证明了所提出的方法\textit{spred}是$L_1$的精确可微求解器,且对于一般的非凸函数,该重参数化技巧是完全“无害”的。在实际应用中,我们展示了该方法在以下任务中的有效性:(1)训练稀疏神经网络执行基因选择任务,这涉及在高维空间中寻找相关特征;(2)神经网络压缩任务——以往尝试应用$L_1$惩罚项的方法均未成功。从概念上讲,我们的结果弥合了深度学习中的稀疏性与传统统计学习之间的鸿沟。