Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, particularly in task generalization for both text and vision data. While fine-tuning these models can significantly enhance their performance on specific downstream tasks, it often requires high-quality data that cannot be shared due to privacy concerns. Federated Learning (FL) offers a promising solution for collaborative training without direct data sharing. However, many parameter-efficient fine-tuning strategies for LLMs in FL, particularly those based on Low-Rank Adaptation (LoRA), face limitations. In this paper, we critically analyze the convergence and performance guarantees of popular FL frameworks utilizing LoRA, highlighting its suboptimal nature due to constrained subspace learning of low-rank matrices. This limitation hinders effective fine-tuning of LLMs in federated settings. Through rigorous analytical and empirical evaluations, we demonstrate that direct weight averaging outperforms LoRA-based strategies, leading to superior performance for fine-tuned models. Our comprehensive comparison exposes inefficiencies in LoRA approaches and underscores the advantages of full-rank weight aggregation. We extend our analysis to low-rank gradient-based optimizers, such as GaLore, used during local training steps. Our findings show that GaLore is a more effective alternative, outperforming federated LoRA methods like FlexLoRA and FFA-LoRA across both text and image modalities. While privacy remains paramount in FL discourse, our focus is on assessing performance outcomes of federated fine-tuned models and evaluating various FL frameworks from both theoretical and empirical perspectives. Our findings advocate reassessing the reliance on LoRA within FL contexts, paving the way for more efficient training methodologies.
翻译:大语言模型(LLM)已在多个领域展现出卓越能力,尤其在文本和视觉数据的任务泛化方面表现突出。尽管微调这些模型能显著提升其在特定下游任务上的性能,但通常需要高质量数据,而这些数据因隐私问题无法共享。联邦学习(FL)为无需直接数据共享的协同训练提供了有前景的解决方案。然而,许多针对FL中LLM的参数高效微调策略,尤其是基于低秩自适应(LoRA)的方法,存在局限性。本文对采用LoRA的流行FL框架的收敛性和性能保证进行了批判性分析,指出其因低秩矩阵的受限子空间学习而导致的次优特性。这一限制阻碍了LLM在联邦环境下的有效微调。通过严谨的理论分析和实证评估,我们证明直接权重平均优于基于LoRA的策略,能为微调模型带来更优性能。我们的全面比较揭示了LoRA方法的低效性,并凸显了全秩权重聚合的优势。我们将分析扩展至本地训练步骤中使用的低秩梯度优化器(如GaLore)。研究结果表明,GaLore是一种更有效的替代方案,在文本和图像模态上均优于FlexLoRA、FFA-LoRA等联邦LoRA方法。尽管隐私问题在FL讨论中至关重要,但本文重点在于评估联邦微调模型的性能结果,并从理论和实证角度评估各类FL框架。我们的发现主张重新评估FL场景中对LoRA的依赖,为开发更高效的训练方法铺平道路。