The confluence of Federated Learning (FL) and Large Language Models (LLMs) is ushering in a new era in privacy-preserving natural language processing. However, the intensive memory requirements for fine-tuning LLMs pose significant challenges, especially when deploying on edge devices with limited computational resources. To circumvent this, we explore the novel integration of Memory-efficient Zeroth-Order Optimization within a federated setting, a synergy we denote as FedMeZO. Our study is the first to examine the theoretical underpinnings of FedMeZO in the context of LLMs, tackling key questions regarding the influence of large parameter spaces on optimization behavior, the establishment of convergence properties, and the identification of critical parameters for convergence to inform personalized federated strategies. Our extensive empirical evidence supports the theory, showing that FedMeZO not only converges faster than traditional first-order methods such as SGD but also significantly reduces GPU memory usage during training to levels comparable to those during inference. Moreover, the proposed personalized FL strategy that is built upon the theoretical insights to customize the client-wise learning rate can effectively accelerate loss reduction. We hope our work can help to bridge theoretical and practical aspects of federated fine-tuning for LLMs and facilitate further development and research.
翻译:联邦学习(FL)与大型语言模型(LLM)的融合正引领隐私保护自然语言处理进入新时代。然而,微调LLM所需的高内存需求带来了显著挑战,尤其是在计算资源受限的边缘设备上部署时。为规避这一问题,我们探索了在联邦框架内创新性地集成内存高效零阶优化方法,并将其联合框架命名为FedMeZO。本研究首次在LLM场景下考察FedMeZO的理论基础,重点解决大参数空间对优化行为的影响、收敛性质的建立,以及关键收敛参数的识别等核心问题,从而为个性化联邦策略提供理论依据。大量实验证据支持理论分析:FedMeZO不仅比传统一阶方法(如SGD)收敛更快,还能在训练期间显著降低GPU内存使用量至与推理阶段相当的水平。此外,基于理论洞察提出的、为客户自适应学习率定制的个性化联邦学习策略,可有效加速损失下降。我们期望这项工作能弥合LLM联邦微调中理论与实践层面的鸿沟,并推动相关领域的进一步发展与研究。