Pre-trained Language Models (PLMs) have demonstrated their superiority and versatility in modern Natural Language Processing (NLP), effectively adapting to various downstream tasks through further fine-tuning. Federated Parameter-Efficient Fine-Tuning (FedPEFT) has emerged as a promising solution to address privacy and efficiency challenges in distributed training for PLMs on resource-constrained local devices. However, our measurements reveal two key limitations of FedPEFT: heterogeneous data across devices exacerbates performance degradation of low-rank adaptation, and a fixed parameter configuration results in communication inefficiency. To overcome these limitations, we propose FedARA, a novel adaptive rank allocation framework for federated parameter-efficient fine-tuning of language models. Specifically, FedARA employs truncated Singular Value Decomposition (SVD) adaptation to enhance similar feature representation across clients, significantly mitigating the adverse effects of data heterogeneity. Subsequently, it utilizes dynamic rank allocation to progressively identify critical ranks, effectively improving communication efficiency. Lastly, it leverages rank-based module pruning to automatically remove inactive modules, steadily reducing local computational cost and memory usage in each federated learning round. Extensive experiments show that FedARA consistently outperforms baselines by an average of 6.95% to 8.49% across various datasets and models under heterogeneous data while significantly improving communication efficiency by 2.40$ \times$. Moreover, experiments on various edge devices demonstrate substantial decreases in total training time and energy consumption by up to 48.90% and 46.95%, respectively.
翻译:预训练语言模型(PLMs)在现代自然语言处理(NLP)中展现了其优越性和通用性,能够通过进一步的微调有效适应各种下游任务。联邦参数高效微调(FedPEFT)已成为一种有前景的解决方案,旨在解决资源受限的本地设备上对PLMs进行分布式训练时面临的隐私和效率挑战。然而,我们的测量揭示了FedPEFT的两个关键局限性:设备间的异构数据加剧了低秩适应的性能下降,而固定的参数配置导致了通信效率低下。为了克服这些局限性,我们提出了FedARA,一种用于语言模型联邦参数高效微调的新型自适应秩分配框架。具体而言,FedARA采用截断奇异值分解(SVD)适应来增强客户端间相似的特征表示,显著减轻了数据异构性的不利影响。随后,它利用动态秩分配逐步识别关键秩,有效提高了通信效率。最后,它借助基于秩的模块剪枝自动移除非活跃模块,在每一轮联邦学习迭代中稳步降低本地计算成本和内存使用。大量实验表明,在异构数据下,FedARA在各种数据集和模型上始终优于基线方法,平均提升幅度为6.95%至8.49%,同时将通信效率显著提高了2.40$ \times$。此外,在各种边缘设备上的实验表明,总训练时间和能耗分别大幅降低了高达48.90%和46.95%。