The Edge of Stability (EoS) phenomenon, where gradient descent operates with sharpness exceeding the classical convergence threshold yet the loss decreases over long timescales, is ubiquitous in modern deep learning but remains poorly understood in realistic settings. Prior rigorous analyses have been largely confined to scalar or low-dimensional losses with specific structural forms. In this work, we develop a bifurcation theory framework for gradient descent on the edge of stability that applies directly to overparameterized neural networks. By decomposing the training dynamics into components normal and tangent to the manifold of minimizers, we show that stable EoS training arises from a flip bifurcation in the normal direction, governed by the sign of the first Lyapunov coefficient, while the tangent dynamics drift toward regions of decreasing sharpness. Under mild spectral and geometric assumptions on the loss landscape, we prove convergence to the minimizing manifold when training at the EoS threshold. As a corollary, we recover and unify prior results: we show that the product-stability condition of Gan (2026) is an instance of our framework.
翻译:摘要:稳定性边缘现象——即梯度下降运行时锐度超过经典收敛阈值,但损失函数在长时间尺度上仍能下降——在现代深度学习中普遍存在,但在实际场景中仍缺乏深入理解。先前的严格分析大多局限于具有特定结构形式的标量或低维损失函数。在本工作中,我们发展了一个直接适用于过参数化神经网络的梯度下降在稳定性边缘上的分岔理论框架。通过将训练动力学分解为垂直于最小化流形和平行于最小化流形的分量,我们证明稳定的稳定性边缘训练源于法向方向上的翻转分岔,该分岔由第一李雅普诺夫系数的符号控制,而切向动力学则向锐度递减的区域漂移。在损失景观的温和光谱和几何假设下,我们证明了在稳定性边缘阈值处训练时收敛到最小化流形。作为推论,我们还原并统一了先前的结果:我们证明了 GAN 等人(2026)的乘积稳定性条件是我们框架的一个实例。