We study the implicit regularization of gradient descent towards structured sparsity via a novel neural reparameterization, which we call a diagonally grouped linear neural network. We show the following intriguing property of our reparameterization: gradient descent over the squared regression loss, without any explicit regularization, biases towards solutions with a group sparsity structure. In contrast to many existing works in understanding implicit regularization, we prove that our training trajectory cannot be simulated by mirror descent. We analyze the gradient dynamics of the corresponding regression problem in the general noise setting and obtain minimax-optimal error rates. Compared to existing bounds for implicit sparse regularization using diagonal linear networks, our analysis with the new reparameterization shows improved sample complexity. In the degenerate case of size-one groups, our approach gives rise to a new algorithm for sparse linear regression. Finally, we demonstrate the efficacy of our approach with several numerical experiments.
翻译:我们通过一种新颖的神经重参数化方法——称为对角分组线性神经网络——研究梯度下降向结构化稀疏性的隐式正则化。我们的重参数化方法展现出以下引人入胜的特性:在平方回归损失下,无需任何显式正则化,梯度下降会倾向于得到具有分组稀疏性结构的解。与许多现有关于隐式正则化的研究工作不同,我们证明训练轨迹无法通过镜像下降模拟。我们在一般噪声设置下分析了相应回归问题的梯度动力学,并获得了极小化最优误差率。与使用对角线性网络的现有隐式稀疏正则化界相比,我们采用新重参数化方法进行的分析展现出更优的样本复杂度。在单元素分组的退化情形下,我们的方法为稀疏线性回归推导出一种新算法。最后,我们通过多项数值实验验证了该方法的效果。