Sparse Training for Federated Learning with Regularized Error Correction

Federated Learning (FL) has attracted much interest due to the significant advantages it brings to training deep neural network (DNN) models. However, since communications and computation resources are limited, training DNN models in FL systems face challenges such as elevated computational and communication costs in complex tasks. Sparse training schemes gain increasing attention in order to scale down the dimensionality of each client (i.e., node) transmission. Specifically, sparsification with error correction methods is a promising technique, where only important updates are sent to the parameter server (PS) and the rest are accumulated locally. While error correction methods have shown to achieve a significant sparsification level of the client-to-PS message without harming convergence, pushing sparsity further remains unresolved due to the staleness effect. In this paper, we propose a novel algorithm, dubbed Federated Learning with Accumulated Regularized Embeddings (FLARE), to overcome this challenge. FLARE presents a novel sparse training approach via accumulated pulling of the updated models with regularization on the embeddings in the FL process, providing a powerful solution to the staleness effect, and pushing sparsity to an exceptional level. The performance of FLARE is validated through extensive experiments on diverse and complex models, achieving a remarkable sparsity level (10 times and more beyond the current state-of-the-art) along with significantly improved accuracy. Additionally, an open-source software package has been developed for the benefit of researchers and developers in related fields.

翻译：联邦学习（FL）因其在训练深度神经网络（DNN）模型方面带来的显著优势而备受关注。然而，由于通信与计算资源有限，在联邦学习系统中训练DNN模型面临着复杂任务中计算与通信成本高昂等挑战。为降低各客户端（即节点）传输的维度，稀疏训练方案日益受到重视。具体而言，结合误差校正的稀疏化是一种前景广阔的技术，其中仅重要更新被发送至参数服务器（PS），其余更新则在本地累积。尽管误差校正方法已被证明能在不影响收敛的前提下实现客户端到PS消息的显著稀疏化，但由于陈旧效应的影响，进一步提升稀疏度仍是一个未解决的问题。本文提出一种新颖算法，称为基于累积正则化嵌入的联邦学习（FLARE），以克服这一挑战。FLARE通过在联邦学习过程中对嵌入施加正则化并累积拉取更新模型，提出了一种创新的稀疏训练方法，为陈旧效应提供了强有力的解决方案，并将稀疏度推至卓越水平。通过在不同复杂模型上进行大量实验，验证了FLARE的性能，其实现了显著的稀疏度（超越当前最佳水平10倍以上）并显著提升了准确率。此外，为便利相关领域的研究者与开发者，已开发并开源相应的软件工具包。