In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over diagonal linear networks. We prove the convergence of GD and SGD with macroscopic stepsizes in an overparametrised regression setting and characterise their solutions through an implicit regularisation problem. Our crisp characterisation leads to qualitative insights about the impact of stochasticity and stepsizes on the recovered solution. Specifically, we show that large stepsizes consistently benefit SGD for sparse regression problems, while they can hinder the recovery of sparse solutions for GD. These effects are magnified for stepsizes in a tight window just below the divergence threshold, in the "edge of stability" regime. Our findings are supported by experimental results.
翻译:本文研究了随机性和大步长对对角线性网络上梯度下降(GD)和随机梯度下降(SGD)隐式正则化的影响。我们证明了在过参数化回归设置中,具有宏观步长的GD和SGD的收敛性,并通过隐式正则化问题刻画了其解的特性。这一清晰刻画为理解随机性和步长对恢复解的影响提供了定性洞察。具体而言,我们表明大步长始终有利于SGD处理稀疏回归问题,但可能阻碍GD恢复稀疏解。这些效应在略低于发散阈值的狭窄步长窗口内(即"稳定性边缘"区间)被放大。我们的发现得到了实验结果的支撑。