Due to the communication bottleneck in distributed and federated learning applications, algorithms using communication compression have attracted significant attention and are widely used in practice. Moreover, the huge number, high heterogeneity and limited availability of clients result in high client-variance. This paper addresses these two issues together by proposing compressed and client-variance reduced methods COFIG and FRECON. We prove an $O(\frac{(1+\omega)^{3/2}\sqrt{N}}{S\epsilon^2}+\frac{(1+\omega)N^{2/3}}{S\epsilon^2})$ bound on the number of communication rounds of COFIG in the nonconvex setting, where $N$ is the total number of clients, $S$ is the number of clients participating in each round, $\epsilon$ is the convergence error, and $\omega$ is the variance parameter associated with the compression operator. In case of FRECON, we prove an $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon^2})$ bound on the number of communication rounds. In the convex setting, COFIG converges within $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon})$ communication rounds, which, to the best of our knowledge, is also the first convergence result for compression schemes that do not communicate with all the clients in each round. We stress that neither COFIG nor FRECON needs to communicate with all the clients, and they enjoy the first or faster convergence results for convex and nonconvex federated learning in the regimes considered. Experimental results point to an empirical superiority of COFIG and FRECON over existing baselines.
翻译:由于分布式和联邦学习应用中存在通信瓶颈,采用通信压缩的算法备受关注并广泛应用于实践。此外,客户端数量庞大、高度异构且可用性有限导致客户端方差较高。本文通过提出压缩客户端方差缩减方法COFIG与FRECON,同时解决了这两个问题。在非凸设定下,我们证明COFIG方法的通信轮次上界为$O(\frac{(1+\omega)^{3/2}\sqrt{N}}{S\epsilon^2}+\frac{(1+\omega)N^{2/3}}{S\epsilon^2})$,其中$N$为客户端总数,$S$为每轮参与客户端数量,$\epsilon$为收敛误差,$\omega$为压缩算子相关的方差参数。对于FRECON方法,我们证明其通信轮次上界为$O(\frac{(1+\omega)\sqrt{N}}{S\epsilon^2})$。在凸设定下,COFIG在$O(\frac{(1+\omega)\sqrt{N}}{S\epsilon})$个通信轮次内收敛,据我们所知,这是首个针对每轮无需与所有客户端通信的压缩方案的收敛性结果。需要强调的是,COFIG与FRECON均无需与所有客户端通信,并且在所考虑的凸与非凸联邦学习设定下,首次实现了更优的收敛结果或加速收敛。实验结果表明,COFIG与FRECON相比现有基线方法具有经验优越性。