In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over diagonal linear networks. We prove the convergence of GD and SGD with macroscopic stepsizes in an overparametrised regression setting and characterise their solutions through an implicit regularisation problem. Our crisp characterisation leads to qualitative insights about the impact of stochasticity and stepsizes on the recovered solution. Specifically, we show that large stepsizes consistently benefit SGD for sparse regression problems, while they can hinder the recovery of sparse solutions for GD. These effects are magnified for stepsizes in a tight window just below the divergence threshold, in the ``edge of stability'' regime. Our findings are supported by experimental results.
翻译:本文研究了随机性与大步长对对角线性网络上梯度下降(GD)与随机梯度下降(SGD)隐式正则化的影响。我们在过参数化回归场景中证明了采用宏观步长的GD与SGD的收敛性,并通过隐式正则化问题刻画了它们的解。这一精准刻画为我们提供了关于随机性与步长对恢复解影响的定性见解。具体而言,我们发现大步长持续有利于SGD求解稀疏回归问题,但可能阻碍GD对稀疏解的恢复。在接近发散阈值狭窄窗口内的"稳定性边界"区域,上述效应被显著放大。我们的发现得到了实验结果的支撑。