We propose to minimize a generic differentiable objective with $L_1$ constraint using a simple reparametrization and straightforward stochastic gradient descent. Our proposal is the direct generalization of previous ideas that the $L_1$ penalty may be equivalent to a differentiable reparametrization with weight decay. We prove that the proposed method, \textit{spred}, is an exact differentiable solver of $L_1$ and that the reparametrization trick is completely ``benign" for a generic nonconvex function. Practically, we demonstrate the usefulness of the method in (1) training sparse neural networks to perform gene selection tasks, which involves finding relevant features in a very high dimensional space, and (2) neural network compression task, to which previous attempts at applying the $L_1$-penalty have been unsuccessful. Conceptually, our result bridges the gap between the sparsity in deep learning and conventional statistical learning.
翻译:摘要:我们提出通过简单的重参数化与直接的随机梯度下降法,来最小化一个带有$L_1$约束的通用可微目标函数。该方法直接推广了先前的一个观点,即$L_1$惩罚可能等价于带有权重衰减的可微重参数化。我们证明所提出的方法——\textit{spred}——是$L_1$的精确可微求解器,且该重参数化技巧对于通用的非凸函数完全是“良性的”。在实际应用中,我们展示了该方法在以下任务中的有效性:(1)训练稀疏神经网络进行基因选择任务,这需要在极高维度空间中找到相关特征;(2)神经网络压缩任务,而先前尝试应用$L_1$惩罚的方法在此任务中未能成功。从概念上讲,我们的结果弥合了深度学习中的稀疏性与传统统计学习之间的差距。