Federated Learning (FL) has attracted much interest due to the significant advantages it brings to training deep neural network (DNN) models. However, since communications and computation resources are limited, training DNN models in FL systems face challenges such as elevated computational and communication costs in complex tasks. Sparse training schemes gain increasing attention in order to scale down the dimensionality of each client (i.e., node) transmission. Specifically, sparsification with error correction methods is a promising technique, where only important updates are sent to the parameter server (PS) and the rest are accumulated locally. While error correction methods have shown to achieve a significant sparsification level of the client-to-PS message without harming convergence, pushing sparsity further remains unresolved due to the staleness effect. In this paper, we propose a novel algorithm, dubbed Federated Learning with Accumulated Regularized Embeddings (FLARE), to overcome this challenge. FLARE presents a novel sparse training approach via accumulated pulling of the updated models with regularization on the embeddings in the FL process, providing a powerful solution to the staleness effect, and pushing sparsity to an exceptional level. The performance of FLARE is validated through extensive experiments on diverse and complex models, achieving a remarkable sparsity level (10 times and more beyond the current state-of-the-art) along with significantly improved accuracy. Additionally, an open-source software package has been developed for the benefit of researchers and developers in related fields.
翻译:联邦学习(FL)因在训练深度神经网络(DNN)模型中带来的显著优势而备受关注。然而,由于通信与计算资源受限,在FL系统中训练DNN模型面临诸多挑战,例如复杂任务中高昂的计算与通信成本。为降低每个客户端(即节点)传输的维度,稀疏训练方案逐渐受到重视。具体而言,结合纠错方法的稀疏化是一种极具前景的技术:仅有重要更新被发送至参数服务器(PS),其余更新则在本地累积。尽管纠错方法已被证明能在不损害收敛性的前提下实现客户端至PS消息的高度稀疏化,但由于陈旧性效应的影响,进一步提高稀疏度仍是一个未解难题。本文提出一种名为“联邦学习累积正则化嵌入”(FLARE)的新算法以应对该挑战。FLARE提出一种新颖的稀疏训练方法,通过在联邦过程中对嵌入进行正则化并累积拉取更新后的模型,有效克服陈旧性效应,将稀疏度提升至前所未有的水平。通过对多样化复杂模型的广泛实验,FLARE的性能得到验证:其实现了显著的稀疏度(比当前最优方法高10倍以上),同时大幅提升精度。此外,为惠及相关领域的研究者与开发者,我们还开发了一套开源软件包。