Low-rank adaptation (LoRA) is one of the most popular task-specific parameter-efficient fine-tuning (PEFT) methods on pre-trained language models for its good performance and computational efficiency. LoRA injects a product of two trainable rank decomposition matrices over the top of each frozen pre-trained model module. However, when applied in the setting of privacy-preserving federated learning (FL), LoRA may become unstable due to the following facts: 1) the effects of data heterogeneity and multi-step local updates are non-negligible, 2) additive noise enforced on updating gradients to guarantee differential privacy (DP) can be amplified and 3) the final performance is susceptible to hyper-parameters. A key factor leading to these phenomena is the discordance between jointly optimizing the two low-rank matrices by local clients and separately aggregating them by the central server. Thus, this paper proposes an efficient and effective version of LoRA, Federated Freeze A LoRA (FFA-LoRA), to alleviate these challenges and further halve the communication cost of federated fine-tuning LLMs. The core idea of FFA-LoRA is to fix the randomly initialized non-zero matrices and only fine-tune the zero-initialized matrices. Compared to LoRA, FFA-LoRA is motivated by practical and theoretical benefits in privacy-preserved FL. Our experiments demonstrate that FFA-LoRA provides more consistent performance with better computational efficiency over vanilla LoRA in various FL tasks.
翻译:低秩自适应(LoRA)是基于预训练语言模型最流行的任务特定参数高效微调(PEFT)方法之一,因其良好的性能和计算效率而受到广泛关注。LoRA在每个冻结的预训练模型模块上注入两个可训练的低秩分解矩阵的乘积。然而,在隐私保护联邦学习(FL)中应用LoRA时,可能因以下事实变得不稳定:1)数据异构性和多步局部更新的影响不可忽略,2)为确保差分隐私(DP)而在更新梯度上施加的加性噪声可能被放大,3)最终性能对超参数敏感。导致这些现象的关键因素在于:本地客户端联合优化两个低秩矩阵与中央服务器分别聚合这两个矩阵之间存在不一致性。因此,本文提出了一种高效且有效的LoRA变体——联邦冻结A LoRA(FFA-LoRA),以缓解这些挑战,并进一步将联邦微调大型语言模型的通信成本减半。FFA-LoRA的核心思想是固定随机初始化的非零矩阵,仅微调零初始化的矩阵。与LoRA相比,FFA-LoRA在隐私保护FL中具有实践和理论上的优势。我们的实验表明,在各种FL任务中,FFA-LoRA比原始LoRA提供更一致的性能,且计算效率更高。