Gradient descent in deep learning may operate at the edge of stability (EoS), a regime in which the largest eigenvalue of the loss Hessian hovers near the stability threshold $2/η$, where $η$ is the learning rate. Classical analysis tools such as gradient flow and the descent lemma do not apply here, motivating the search for a continuous-time model valid at EoS. We propose Edge Flow, a system of three coupled ordinary differential equations that provides a tractable, faithful, and predictive model of gradient descent dynamics at EoS. Edge Flow decomposes the dynamics into a center, an oscillation direction, and an oscillation magnitude. The center follows a modified gradient flow on a symmetrized loss; the direction tracks a top eigenvector of the Hessian via Rayleigh quotient dynamics; and the magnitude grows or decays exponentially depending on whether the sharpness exceeds or falls below the threshold $2/η$. Crucially, sharpness stabilization emerges from the coupled dynamics via a self-stabilization feedback loop. Discretizing Edge Flow only requires two gradient evaluations and one Hessian--vector product at each iteration. We demonstrate empirically that Edge Flow tracks the dynamics of gradient descent at least as faithfully as previously proposed continuous-time EoS models, while in addition resolving the oscillation of the sharpness at the onset of EoS, and that it provides a principled framework for understanding and mitigating instabilities in this regime.
翻译:深度学习中的梯度下降可能在“边缘稳定性”(EoS)区域运行,此时损失Hessian矩阵的最大特征值在稳定阈值$2/η$($η$为学习率)附近波动。经典分析工具(如梯度流和下降引理)在此区域失效,因此需要寻找适用于EoS的连续时间模型。我们提出“边缘流”(Edge Flow)——一个由三个常微分方程耦合组成的系统,能够可解析、忠实且可预测地描述EoS下的梯度下降动力学。边缘流将动力学分解为中心、振荡方向和振荡幅度三个分量:中心遵循对称化损失上的修正梯度流;方向通过Rayleigh商动力学追踪Hessian矩阵的最大特征向量;幅度则根据尖锐度是否超过或低于阈值$2/η$呈指数增长或衰减。关键的是,尖锐度稳定化通过耦合动力学中的自稳定反馈回路自发产生。对边缘流进行离散化时,每次迭代仅需两次梯度计算和一次Hessian-向量乘积。我们通过实验证明:边缘流对梯度下降动力学的跟踪精度至少不低于已有的EoS连续时间模型,同时能够解析EoS起始阶段尖锐度的振荡行为,并为理解和缓解该区域的不稳定性提供理论框架。