We consider the problem of finding second-order stationary points of heterogeneous federated learning (FL). Previous works in FL mostly focus on first-order convergence guarantees, which do not rule out the scenario of unstable saddle points. Meanwhile, it is a key bottleneck of FL to achieve communication efficiency without compensating the learning accuracy, especially when local data are highly heterogeneous across different clients. Given this, we propose a novel algorithm Power-EF that only communicates compressed information via a novel error-feedback scheme. To our knowledge, Power-EF is the first distributed and compressed SGD algorithm that provably escapes saddle points in heterogeneous FL without any data homogeneity assumptions. In particular, Power-EF improves to second-order stationary points after visiting first-order (possibly saddle) points, using additional gradient queries and communication rounds only of almost the same order required by first-order convergence, and the convergence rate exhibits a linear speedup in terms of the number of workers. Our theory improves/recovers previous results, while extending to much more tolerant settings on the local data. Numerical experiments are provided to complement the theory.
翻译:我们研究了异构联邦学习(FL)中寻找二阶驻点的问题。以往的FL研究主要关注一阶收敛保证,这无法排除不稳定鞍点场景。同时,在不牺牲学习精度的前提下实现通信效率是FL的关键瓶颈,尤其是当各客户端本地数据高度异构时。为此,我们提出了一种新型算法Power-EF,该算法通过创新的误差反馈机制仅传输压缩信息。据我们所知,Power-EF是首个无需任何数据同质性假设、能在异构FL中可证明逃离鞍点的分布式压缩SGD算法。特别地,Power-EF在访问一阶(可能的)鞍点后,仅需与一阶收敛几乎相同量级的额外梯度查询和通信轮次即可改进至二阶驻点,且收敛速度呈现随工作节点数线性加速的特征。我们的理论改进/恢复了先前结果,同时扩展至对本地数据更宽松的设置。数值实验进一步验证了理论分析。