The confluence of Federated Learning (FL) and Large Language Models (LLMs) is ushering in a new era in privacy-preserving natural language processing. However, the intensive memory requirements for fine-tuning LLMs pose significant challenges, especially when deploying on clients with limited computational resources. To circumvent this, we explore the novel integration of Memory-efficient Zeroth-Order Optimization within a federated setting, a synergy we term as FedMeZO. Our study is the first to examine the theoretical underpinnings of FedMeZO in the context of LLMs, tackling key questions regarding the influence of large parameter spaces on optimization behavior, the establishment of convergence properties, and the identification of critical parameters for convergence to inform personalized federated strategies. Our extensive empirical evidence supports the theory, showing that FedMeZO not only converges faster than traditional first-order methods such as FedAvg but also significantly reduces GPU memory usage during training to levels comparable to those during inference. Moreover, the proposed personalized FL strategy that is built upon the theoretical insights to customize the client-wise learning rate can effectively accelerate loss reduction. We hope our work can help to bridge theoretical and practical aspects of federated fine-tuning for LLMs, thereby stimulating further advancements and research in this area.
翻译:联邦学习与大语言模型的融合正在开创隐私保护自然语言处理的新时代。然而,微调大语言模型所需的大量内存资源带来了严峻挑战,尤其是在计算资源受限的客户端部署场景中。为突破这一瓶颈,我们探索了将内存高效零阶优化与联邦设置创新性结合的方案,并将其命名为FedMeZO。本研究首次在理论层面剖析FedMeZO在大语言模型中的应用,重点解决以下关键问题:大规模参数空间对优化行为的影响机制、收敛特性的建立方法,以及基于收敛关键参数实现个性化联邦策略。大量实验证据验证了理论成果:FedMeZO不仅比FedAvg等传统一阶方法收敛更快,还能将训练过程的GPU内存消耗降至与推理阶段相当的水平。更关键的是,基于理论洞见构建的个性化联邦学习策略通过定制客户端学习率,有效加速了损失函数收敛。我们期望这项研究能够弥合联邦微调大语言模型的理论与实践鸿沟,推动该领域的进一步发展与创新。