The Edge of Stability (EoS) is a phenomenon where the sharpness (largest eigenvalue) of the Hessian approaches and then hovers near the stability threshold $2/η$ during gradient descent (GD) with step size $η$. Despite (apparently) violating classical smoothness assumptions, EoS has been widely observed in deep learning, but its theoretical foundations remain incomplete. We provide an interpretation of EoS through the lens of Directional Smoothness [Mishkin et al., 2024]. This interpretation naturally extends to non-Euclidean norms, which we use to define generalized sharpness under an arbitrary norm. Our generalized sharpness measure includes previously studied vanilla GD and preconditioned GD as special cases, as well as methods for which EoS has not been studied, such as $\ell_{\infty}$-descent, Block CD, Spectral GD, and their normalized versions. Through experiments on neural networks, we show that non-Euclidean GD with our generalized sharpness also exhibits progressive sharpening followed by oscillations around or above the threshold $2/η$. Practically, our framework provides a geometry-aware spectral diagnostic that can be applied across a broad class of non-Euclidean gradient methods.
翻译:稳定性边缘(EoS)是一种现象,其中Hessian矩阵的锐度(最大特征值)在梯度下降(GD)中接近并随后徘徊在步长η对应的稳定性阈值2/η附近。尽管(表面上)违背了经典光滑性假设,但EoS已在深度学习中广泛观察到,其理论基础仍不完整。我们通过方向光滑性[Mishkin等人,2024]的视角提供了对EoS的解释。该解释自然扩展到非欧几里得范数,我们以此在任意范数下定义广义锐度。我们的广义锐度度量将先前研究的普通GD和预处理GD作为特例,也包括尚未研究EoS的方法,如ℓ∞下降、块坐标下降(Block CD)、谱梯度下降(Spectral GD)及其归一化版本。通过神经网络实验,我们展示了具有广义锐度的非欧几里得GD也表现出渐进锐化,随后在阈值2/η附近或之上振荡。在实践中,我们的框架提供了一种几何感知的谱诊断工具,可应用于广泛类别的非欧几里得梯度方法。