Federated Learning (FL) offers a promising framework for collaborative and privacy-preserving machine learning across distributed data sources. However, the substantial communication costs associated with FL significantly challenge its efficiency. Specifically, in each communication round, the communication costs scale linearly with the model's dimension, which presents a formidable obstacle, especially in large model scenarios. Despite various communication-efficient strategies, the intrinsic dimension-dependent communication cost remains a major bottleneck for current FL implementations. This paper proposes a novel dimension-free communication algorithm -- DeComFL, which leverages the zeroth-order optimization techniques and reduces the communication cost from $\mathscr{O}(d)$ to $\mathscr{O}(1)$ by transmitting only a constant number of scalar values between clients and the server in each round, regardless of the dimension $d$ of the model parameters. Theoretically, in non-convex functions, we prove that our algorithm achieves state-of-the-art rates, which show a linear speedup of the number of clients and local steps under standard assumptions. With additional low effective rank assumption, we can further show the convergence rate is independent of the model dimension $d$ as well. Empirical evaluations, encompassing both classic deep learning training and large language model fine-tuning, demonstrate significant reductions in communication overhead. Notably, DeComFL achieves this by transmitting only around 1MB of data in total between the server and a client to fine-tune a model with billions of parameters.
翻译:联邦学习为跨分布式数据源的协作式隐私保护机器学习提供了一个前景广阔的框架。然而,联邦学习伴随的巨额通信成本严重制约了其效率。具体而言,在每一轮通信中,通信成本随模型维度线性增长,这构成了一个巨大障碍,在大模型场景下尤为突出。尽管存在多种通信高效策略,但固有的维度依赖通信成本仍是当前联邦学习实现的主要瓶颈。本文提出了一种新颖的维度无关通信算法——DeComFL,该算法利用零阶优化技术,在每轮通信中仅需在客户端与服务器之间传输恒定数量的标量值,即可将通信成本从 $\mathscr{O}(d)$ 降低至 $\mathscr{O}(1)$,而与模型参数维度 $d$ 无关。理论上,对于非凸函数,我们证明了该算法达到了最先进的收敛速率,在标准假设下显示出客户端数量和本地步数的线性加速。在额外的低有效秩假设下,我们进一步证明其收敛速率也与模型维度 $d$ 无关。实证评估涵盖了经典深度学习训练和大语言模型微调,均显示出通信开销的显著降低。值得注意的是,DeComFL 通过在服务器与单个客户端之间总共仅传输约 1MB 的数据,即可完成对具有数十亿参数模型的微调。