Federated Learning (FL) is a promising privacy-preserving distributed learning paradigm but suffers from high communication cost when training large-scale machine learning models. Sign-based methods, such as SignSGD \cite{bernstein2018signsgd}, have been proposed as a biased gradient compression technique for reducing the communication cost. However, sign-based algorithms could diverge under heterogeneous data, which thus motivated the development of advanced techniques, such as the error-feedback method and stochastic sign-based compression, to fix this issue. Nevertheless, these methods still suffer from slower convergence rates. Besides, none of them allows multiple local SGD updates like FedAvg \cite{mcmahan2017communication}. In this paper, we propose a novel noisy perturbation scheme with a general symmetric noise distribution for sign-based compression, which not only allows one to flexibly control the tradeoff between gradient bias and convergence performance, but also provides a unified viewpoint to existing stochastic sign-based methods. More importantly, the unified noisy perturbation scheme enables the development of the very first sign-based FedAvg algorithm ($z$-SignFedAvg) to accelerate the convergence. Theoretically, we show that $z$-SignFedAvg achieves a faster convergence rate than existing sign-based methods and, under the uniformly distributed noise, can enjoy the same convergence rate as its uncompressed counterpart. Extensive experiments are conducted to demonstrate that the $z$-SignFedAvg can achieve competitive empirical performance on real datasets and outperforms existing schemes.
翻译:联邦学习(FL)是一种有前景的隐私保护分布式学习范式,但在训练大规模机器学习模型时面临通信成本高昂的问题。符号类方法(如SignSGD \cite{bernstein2018signsgd})已被提出作为一种有偏见的梯度压缩技术以降低通信成本。然而,符号类算法可能在异构数据下发生发散,这促使了先进技术的开发,例如误差反馈方法和随机符号压缩,以解决这一问题。尽管如此,这些方法仍存在收敛速度较慢的问题。此外,这些方法都不允许像FedAvg \cite{mcmahan2017communication} 那样进行多次本地SGD更新。在本文中,我们提出了一种新颖的噪声扰动方案,采用一般对称噪声分布进行符号压缩,该方案不仅允许灵活控制梯度偏差与收敛性能之间的权衡,而且提供了现有随机符号方法的统一视角。更重要的是,这种统一的噪声扰动方案促进了首个基于符号的FedAvg算法($z$-SignFedAvg)的发展,以加速收敛。理论上,我们证明$z$-SignFedAvg实现了比现有符号方法更快的收敛速度,并且在均匀分布的噪声下,可以享受与其未压缩版本相同的收敛速度。大量实验表明,$z$-SignFedAvg在真实数据集上能够达到有竞争力的经验性能,并且优于现有方案。