Cohen et al. (2021) empirically study the evolution of the largest eigenvalue of the loss Hessian, also known as sharpness, along the gradient descent (GD) trajectory and observe a phenomenon called the Edge of Stability (EoS). The sharpness increases at the early phase of training (referred to as progressive sharpening), and eventually saturates close to the threshold of $2 / \text{(step size)}$. In this paper, we start by demonstrating through empirical studies that when the EoS phenomenon occurs, different GD trajectories (after a proper reparameterization) align on a specific bifurcation diagram independent of initialization. We then rigorously prove this trajectory alignment phenomenon for a two-layer fully-connected linear network and a single-neuron nonlinear network trained with a single data point. Our trajectory alignment analysis establishes both progressive sharpening and EoS phenomena, encompassing and extending recent findings in the literature.
翻译:Cohen等人(2021)通过实证研究分析了损失函数Hessian矩阵的最大特征值(即锐度)沿梯度下降(GD)轨迹的演化规律,并观察到一种被称为稳定性边缘(EoS)的现象。在训练早期,锐度逐渐增大(称为渐进锐化),最终趋近于阈值$2 / \text{(步长)}$。本文首先通过实证研究表明,当EoS现象发生时,不同GD轨迹(经适当重参数化后)会沿特定分岔图对齐,且该对齐与初始化无关。随后,我们严格证明了单数据点训练下的两层全连接线性网络与单神经元非线性网络中存在的这种轨迹对齐现象。我们的轨迹对齐分析同时确立了渐进锐化与EoS现象,涵盖并拓展了文献中的最新研究成果。