Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions. Federated learning offers a way to fine-tune LLMs using the abundant data on end devices without compromising data privacy. Most existing federated fine-tuning methods for LLMs rely on parameter-efficient fine-tuning techniques, which may not reach the performance height possible with full-parameter tuning. However, federated full-parameter tuning of LLMs is a non-trivial problem due to the immense communication cost. This work introduces FedKSeed that employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds and scalar gradients, amounting to only a few thousand bytes, making federated full-parameter tuning of billion-sized LLMs possible on devices. Building on it, we develop a strategy enabling probability-differentiated seed sampling, prioritizing perturbations with greater impact on model accuracy. Experiments across six scenarios with various LLMs, datasets and data partitions demonstrate that our approach outperforms existing federated LLM fine-tuning methods in both communication efficiency and new task generalization.
翻译:预训练大语言模型需要通过微调来提升其对自然语言指令的响应能力。联邦学习提供了一种利用终端设备上丰富数据微调大语言模型、同时不损害数据隐私的途径。现有大语言模型联邦微调方法大多依赖参数高效微调技术,这些方法可能无法达到全参数微调所能实现的性能高度。然而,由于巨大的通信开销,大语言模型的联邦全参数微调并非易事。本研究提出FedKSeed方法,采用基于有限随机种子的零阶优化。该方法将服务器与客户端之间的传输需求大幅降低至仅需少量随机种子和标量梯度,总计仅需数千字节,使得设备端实现百亿级大语言模型的联邦全参数微调成为可能。在此基础上,我们进一步开发了概率差异化种子采样策略,优先选择对模型精度影响更大的扰动。在涵盖多种大语言模型、数据集及数据划分方式的六类场景实验中,我们的方法在通信效率和新任务泛化能力上均优于现有大语言模型联邦微调方法。