Full-batch gradient descent on neural networks drives the largest Hessian eigenvalue to the threshold $2/η$, where $η$ is the learning rate. This phenomenon, the Edge of Stability, has resisted a unified explanation: existing accounts establish self-regulation near the edge but do not explain why the trajectory is forced toward $2/η$ from arbitrary initialization. We introduce the edge coupling, a functional on consecutive iterate pairs whose coefficient is uniquely fixed by the gradient-descent update. Differencing its criticality condition yields a step recurrence with stability boundary $2/η$, and a second-order expansion yields a loss-change formula whose telescoping sum forces curvature toward $2/η$. The two formulas involve different Hessian averages, but the mean value theorem localizes each to the true Hessian at an interior point of the step segment, yielding exact forcing of the Hessian eigenvalue with no gap. Setting both gradients of the edge coupling to zero classifies fixed points and period-two orbits; near a fixed point, the problem reduces to a function of the half-amplitude alone, which determines which directions support period-two orbits and on which side of the critical learning rate they appear.
翻译:神经网络上的全批次梯度下降会将最大的Hessian特征值驱动至阈值$2/η$,其中$η$为学习率。这一现象被称为“稳定边缘”,至今缺乏统一的解释:现有理论建立了边缘附近的自我调节机制,但未能阐明为何从任意初始化出发的轨迹会被强制趋向$2/η$。我们引入了“边缘耦合”——一种定义在连续迭代对上的泛函,其系数由梯度下降更新规则唯一确定。对其临界条件进行差分,可得到具有稳定性边界$2/η$的步进递推关系;而二阶展开则导出一个损失变化公式,其 telescoping 求和迫使曲率趋向$2/η$。这两个公式涉及不同的Hessian平均值,但中值定理将每个值定位于步进段内部某点处的真实Hessian,从而实现对Hessian特征值的精确强制约束,无任何间隙。将边缘耦合的两个梯度设为零,可分类固定点和周期二轨道;在固定点附近,问题简化为仅依赖于半振幅的函数,该函数决定了哪些方向支持周期二轨道,以及在临界学习率的哪一侧出现这些轨道。