Recently, federated large language models (LLMs) have drawn significant attention thanks to coupled capabilities of LLMs and federated learning (FL) that address privacy concerns in collaborative fine-tuning. However, due to large-scale parameters of LLMs, existing federated LLM fine-tuning frameworks incur significant challenges in resource-constrained clients characterized by heterogeneous computing capabilities and random wireless channels. To address this issue, we propose a joint client-specific pruning and bandwidth allocation (JCPBA) framework for federated LLMs to improve the fine-tuning efficiency over the wireless networks. Specifically, we formulate a fine-tuning latency minimization problem by jointly optimizing pruning rates and bandwidth allocations. Furthermore, we solve this optimization problem using a block coordinate descent method. Extensive experiments on the datasets of Yahoo Answers and GSM8K demonstrate that the proposed framework significantly reduces wall-clock fine-tuning time compared with state-of-the-art baselines and gains equal or lower test loss at the cost of lower computation and communication overhead.
翻译:近年来,联邦大语言模型(LLMs)因结合了LLMs的能力与联邦学习(FL)解决协作微调中的隐私问题而受到广泛关注。然而,由于LLMs参数量巨大,现有的联邦LLM微调框架在资源受限的客户端上面临重大挑战,这些客户端具有异构的计算能力和随机的无线信道特性。为解决这一问题,我们提出了一种面向联邦LLMs的联合客户端特定剪枝与带宽分配(JCPBA)框架,以提升无线网络上的微调效率。具体而言,我们通过联合优化剪枝率与带宽分配,构建了一个微调延迟最小化问题。进一步,我们采用块坐标下降法求解该优化问题。在Yahoo Answers和GSM8K数据集上进行的大量实验表明,所提框架相比现有先进基线方法显著减少了实际微调时间,并以更低的计算与通信开销为代价,获得了相当或更低的测试损失。