Existing analyses of the edge of stability (EoS) treat it as a global property of optimization. We show that it is also selective: the stability constraint redistributes learning across subsets of the training distribution, amplifying progress on some groups while suppressing progress on others. Using a branching intervention that enters or exits the EoS regime from the same training state, we causally demonstrate this trade-off and identify two necessary conditions for a group to benefit. First, its aggregate gradient must align with the top Hessian eigenvector. We isolate this mechanism with a controlled perturbation that preserves distance but randomizes direction, destroying alignment and eliminating the advantage. Second, the group must sustain non-vanishing gradient magnitude over time. Under cross-entropy loss, gradient saturation decouples confidently classified groups, shifting the advantage to output-outliers, whose gradients persist. Together, these results show that EoS functions not only as a stability boundary, but as a mechanism governing the allocation of learning across the data distribution.
翻译:现有对稳定边界(EoS)的分析将其视为优化过程中的全局属性。我们证明它也具有选择性:稳定性约束会重新分配训练分布中各子集上的学习,放大某些组的进展同时抑制其他组的进展。通过从相同训练状态进入或退出EoS状态的分支干预,我们因果性地展示了这种权衡,并识别出一个组受益的两个必要条件。第一,该组的聚合梯度必须与Hessian矩阵的最大特征向量对齐。我们通过控制性扰动——保持距离不变但随机化方向——验证了这一机制:破坏对齐将消除优势。第二,该组必须在时间上维持非零梯度幅度。在交叉熵损失下,梯度饱和将可信分类组解耦,使得优势转向梯度持续存在的输出异常组。综合来看,这些结果表明EoS不仅作为稳定性边界发挥作用,更是控制数据分布上学习分配的机制。