Biased gradient compression with error feedback (EF) reduces communication in federated learning (FL), but under non-IID data, the residual error can decay slowly, causing gradient mismatch and stalled progress in the early rounds. We propose step-ahead partial error feedback (SA-PEF), which integrates step-ahead (SA) correction with partial error feedback (PEF). SA-PEF recovers EF when the step-ahead coefficient $α=0$ and step-ahead EF (SAEF) when $α=1$. For non-convex objectives and $δ$-contractive compressors, we establish a second-moment bound and a residual recursion that guarantee convergence to stationarity under heterogeneous data and partial client participation. The resulting rates match standard non-convex Fed-SGD guarantees up to constant factors, achieving $O((η,η_0TR)^{-1})$ convergence to a variance/heterogeneity floor with a fixed inner step size. Our analysis reveals a step-ahead-controlled residual contraction $ρ_r$ that explains the observed acceleration in the early training phase. To balance SAEF's rapid warm-up with EF's long-term stability, we select $α$ near its theory-predicted optimum. Experiments across diverse architectures and datasets show that SA-PEF consistently reaches target accuracy faster than EF.
翻译:带有误差反馈(EF)的偏置梯度压缩减少了联邦学习(FL)中的通信开销,但在非独立同分布数据下,残留误差可能衰减缓慢,导致梯度失配并在早期轮次中造成进展停滞。我们提出超前步长部分误差反馈(SA-PEF),它将超前步长(SA)校正与部分误差反馈(PEF)相结合。当超前步长系数 $α=0$ 时,SA-PEF 恢复为 EF;当 $α=1$ 时,则恢复为超前步长误差反馈(SAEF)。针对非凸目标函数和 $δ$-压缩算子,我们建立了一个二阶矩界和一个残差递归关系,保证了在异构数据和部分客户端参与下收敛到平稳点。所得收敛速率在常数因子内匹配标准的非凸 Fed-SGD 保证,在固定内部步长下实现了 $O((η,η_0TR)^{-1})$ 收敛到一个方差/异构性下界。我们的分析揭示了一个由超前步长控制的残差收缩率 $ρ_r$,这解释了在早期训练阶段观察到的加速现象。为了平衡 SAEF 的快速预热与 EF 的长期稳定性,我们选择 $α$ 接近其理论预测的最优值。在不同架构和数据集上的实验表明,SA-PEF 始终比 EF 更快达到目标精度。