Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, particularly in task generalization for both text and vision data. While fine-tuning these models can significantly enhance their performance on specific downstream tasks, it often requires high-quality data that cannot be shared due to privacy concerns. Federated Learning (FL) offers a promising solution for collaborative training without direct data sharing. However, many parameter-efficient fine-tuning strategies for LLMs in FL, particularly those based on Low-Rank Adaptation (LoRA), face limitations. In this paper, we critically analyze the convergence and performance guarantees of popular FL frameworks utilizing LoRA, highlighting its suboptimal nature due to constrained subspace learning of low-rank matrices. This limitation hinders effective fine-tuning of LLMs in federated settings. Through rigorous analytical and empirical evaluations, we demonstrate that direct weight averaging outperforms LoRA-based strategies, leading to superior performance for fine-tuned models. Our comprehensive comparison exposes inefficiencies in LoRA approaches and underscores the advantages of direct weight aggregation. We extend our analysis to low-rank gradient-based optimizers, such as GaLore, used during local training steps. Our findings show that GaLore is a more effective alternative, outperforming federated LoRA methods like FlexLoRA and FFA-LoRA across both text and image modalities. While privacy remains paramount in FL discourse, our focus is on assessing performance outcomes of federated fine-tuned models and evaluating various FL frameworks from both theoretical and empirical perspectives. Our findings advocate reassessing the reliance on LoRA within FL contexts, paving the way for more efficient training methodologies.
翻译:大语言模型(LLMs)已在多个领域展现出卓越能力,尤其在文本和视觉数据的任务泛化方面表现突出。尽管微调这些模型能显著提升其在特定下游任务上的性能,但通常需要高质量数据,而此类数据常因隐私问题无法共享。联邦学习(FL)为无需直接数据共享的协同训练提供了有前景的解决方案。然而,联邦学习中针对大语言模型的许多参数高效微调策略,尤其是基于低秩适应(LoRA)的方法,面临诸多局限。本文批判性分析了采用LoRA的主流联邦学习框架的收敛性与性能保证,指出其因低秩矩阵的受限子空间学习而导致的次优性。这一局限阻碍了大语言模型在联邦环境下的有效微调。通过严谨的理论分析与实证评估,我们证明直接权重平均策略优于基于LoRA的方法,能为微调模型带来更卓越的性能。我们的全面对比揭示了LoRA方法的低效性,并凸显了直接权重聚合的优势。我们将分析延伸至本地训练阶段使用的低秩梯度优化器(如GaLore)。研究结果表明,GaLore是更有效的替代方案,在文本和图像模态上均优于FlexLoRA、FFA-LoRA等联邦LoRA方法。尽管隐私保护始终是联邦学习讨论的核心议题,本文重点在于评估联邦微调模型的性能表现,并从理论与实证双重视角审视各类联邦学习框架。我们的发现主张重新评估联邦学习中对LoRA的依赖,为开发更高效的训练方法开辟新路径。