Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, particularly in task generalization for both text and vision data. While fine-tuning these models can significantly enhance their performance on specific downstream tasks, it often requires high-quality data that cannot be shared due to privacy concerns. Federated Learning (FL) offers a promising solution for collaborative training without direct data sharing. However, many parameter-efficient fine-tuning strategies for LLMs in FL, particularly those based on Low-Rank Adaptation (LoRA), face limitations. In this paper, we critically analyze the convergence and performance guarantees of popular FL frameworks utilizing LoRA, highlighting its suboptimal nature due to constrained subspace learning of low-rank matrices. This limitation hinders effective fine-tuning of LLMs in federated settings. Through rigorous analytical and empirical evaluations, we demonstrate that direct weight averaging outperforms LoRA-based strategies, leading to superior performance for fine-tuned models. Our comprehensive comparison unmasks inefficiencies in LoRA approaches and underscores the advantages of direct weight aggregation. We extend our analysis to low-rank gradient-based optimizers, such as GaLore, used during local training steps. Our findings show that GaLore along with direct-weight aggregation is a more effective approach, outperforming federated LoRA methods like FlexLoRA and FFA-LoRA across both text and image modalities. While privacy remains paramount in FL discourse, our focus is on assessing performance outcomes of federated fine-tuned models and evaluating various FL frameworks from both theoretical and empirical perspectives. Our findings advocate reassessing the reliance on LoRA within FL contexts, paving the way for more efficient training methodologies.
翻译:大型语言模型(LLMs)已在多个领域展现出卓越能力,尤其在文本和视觉数据的任务泛化方面表现突出。尽管微调这些模型能显著提升其在特定下游任务上的性能,但通常需要高质量数据,而这些数据因隐私问题往往无法共享。联邦学习(FL)为无需直接数据共享的协同训练提供了有前景的解决方案。然而,许多面向LLMs的联邦学习参数高效微调策略,尤其是基于低秩自适应(LoRA)的方法,存在明显局限性。本文批判性地分析了采用LoRA的主流联邦学习框架的收敛性与性能保证,指出其因低秩矩阵的受限子空间学习而导致的次优性。这一局限阻碍了LLMs在联邦环境下实现有效微调。通过严谨的理论分析与实证评估,我们证明直接权重平均策略优于基于LoRA的方法,能为微调模型带来更优异的性能。我们的全面比较揭示了LoRA方法的低效性,并凸显了直接权重聚合的优势。我们将分析延伸至本地训练阶段使用的低秩梯度优化器(如GaLore)。研究结果表明,结合直接权重聚合的GaLore是一种更有效的方案,在文本与图像模态上均优于FlexLoRA、FFA-LoRA等联邦LoRA方法。尽管隐私保护始终是联邦学习讨论的核心议题,本文重点在于评估联邦微调模型的性能表现,并从理论与实证双重视角考察不同联邦学习框架。我们的发现主张重新审视联邦学习中对LoRA的依赖,为开发更高效的训练方法开辟新路径。