基于零阶优化的联邦学习实现维度无关通信 (Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization)

Federated Learning (FL) offers a promising framework for collaborative and privacy-preserving machine learning across distributed data sources. However, the substantial communication costs associated with FL significantly challenge its efficiency. Specifically, in each communication round, the communication costs scale linearly with the model's dimension, which presents a formidable obstacle, especially in large model scenarios. Despite various communication-efficient strategies, the intrinsic dimension-dependent communication cost remains a major bottleneck for current FL implementations. This paper proposes a novel dimension-free communication algorithm -- DeComFL, which leverages the zeroth-order optimization techniques and reduces the communication cost from $\mathscr{O}(d)$ to $\mathscr{O}(1)$ by transmitting only a constant number of scalar values between clients and the server in each round, regardless of the dimension $d$ of the model parameters. Theoretically, in non-convex functions, we prove that our algorithm achieves state-of-the-art rates, which show a linear speedup of the number of clients and local steps under standard assumptions. With additional low effective rank assumption, we can further show the convergence rate is independent of the model dimension $d$ as well. Empirical evaluations, encompassing both classic deep learning training and large language model fine-tuning, demonstrate significant reductions in communication overhead. Notably, DeComFL achieves this by transmitting only around 1MB of data in total between the server and a client to fine-tune a model with billions of parameters.

翻译：联邦学习为跨分布式数据源的协作式隐私保护机器学习提供了一个前景广阔的框架。然而，联邦学习伴随的巨额通信成本严重制约了其效率。具体而言，在每一轮通信中，通信成本随模型维度线性增长，这构成了一个巨大障碍，在大模型场景下尤为突出。尽管存在多种通信高效策略，但固有的维度依赖通信成本仍是当前联邦学习实现的主要瓶颈。本文提出了一种新颖的维度无关通信算法——DeComFL，该算法利用零阶优化技术，在每轮通信中仅需在客户端与服务器之间传输恒定数量的标量值，即可将通信成本从 $\mathscr{O}(d)$ 降低至 $\mathscr{O}(1)$，而与模型参数维度 $d$ 无关。理论上，对于非凸函数，我们证明了该算法达到了最先进的收敛速率，在标准假设下显示出客户端数量和本地步数的线性加速。在额外的低有效秩假设下，我们进一步证明其收敛速率也与模型维度 $d$ 无关。实证评估涵盖了经典深度学习训练和大语言模型微调，均显示出通信开销的显著降低。值得注意的是，DeComFL 通过在服务器与单个客户端之间总共仅传输约 1MB 的数据，即可完成对具有数十亿参数模型的微调。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/