Recently, federated large language models (LLMs) have drawn significant attention thanks to coupled capabilities of LLMs and federated learning (FL) that address privacy concerns in collaborative fine-tuning. However, due to large-scale parameters of LLMs, existing federated LLM fine-tuning frameworks incur significant challenges in resource-constrained clients characterized by heterogeneous computing capabilities and random wireless channels. To address this issue, we propose a joint client-specific pruning and bandwidth allocation (JCPBA) framework for federated LLMs to improve the fine-tuning efficiency over the wireless networks. Specifically, we formulate a fine-tuning latency minimization problem by jointly optimizing pruning rates and bandwidth allocations. Furthermore, we solve this optimization problem using a block coordinate descent method. Extensive experiments on the datasets of Yahoo Answers and GSM8K demonstrate that the proposed framework significantly reduces wall-clock fine-tuning time compared with state-of-the-art baselines and gains equal or lower test loss at the cost of lower computation and communication overhead.
翻译:近年来,联邦大语言模型(LLMs)因结合了LLMs的能力与联邦学习(FL)解决协作微调中隐私问题的特性而受到广泛关注。然而,由于LLMs参数量巨大,现有联邦LLM微调框架在计算能力异构且无线信道随机的资源受限客户端上面临显著挑战。为解决此问题,我们提出了一种面向联邦LLMs的联合客户端特定剪枝与带宽分配(JCPBA)框架,以提升无线网络中的微调效率。具体而言,我们通过联合优化剪枝率与带宽分配,构建了微调延迟最小化问题。进一步地,我们采用块坐标下降法求解该优化问题。在Yahoo Answers和GSM8K数据集上的大量实验表明,所提框架相比现有先进基线方法能显著减少实际微调时间,并以更低计算和通信开销为代价,获得相当或更低的测试损失。