Federated LoRA provides a communication-efficient mechanism for fine-tuning large language models on decentralized data. In practice, however, a discrepancy between the factor-wise averaging used to preserve low rank and the mathematically correct aggregation of local updates can cause significant aggregation error and unstable training. We argue that a major source of this problem is rotational misalignment, arising from the rotational invariance of low-rank factorizations -- semantically equivalent updates can be represented in different latent subspaces across clients since $(B_i R_i)(R_i^\top A_i) = B_i A_i$. When such misaligned factors are averaged directly, they interfere destructively and degrade the global update. To address this issue, we propose FedRot-LoRA, a federated LoRA framework that aligns client updates via orthogonal transformations prior to aggregation. This alignment preserves the semantic update while reducing cross-client subspace mismatch, without increasing communication cost or restricting model expressivity. We provide a convergence analysis that examines the aggregation error induced by factor-wise averaging and shows how rotational alignment yields a tighter upper bound on this error. Extensive experiments on natural language understanding and generative tasks demonstrate that FedRot-LoRA consistently outperforms existing federated LoRA baselines across a range of heterogeneity levels and LoRA ranks.
翻译:联邦LoRA为在分布式数据上微调大型语言模型提供了一种通信高效的机制。然而在实践中,用于保持低秩的逐因子平均与局部更新的数学精确聚合之间存在差异,这可能导致显著的聚合误差与不稳定的训练。我们认为这一问题的主要根源在于旋转失配,其源于低秩分解的旋转不变性——由于 $(B_i R_i)(R_i^\top A_i) = B_i A_i$,语义等价的更新可以在不同客户端的潜在子空间中以不同方式表示。当这些失配的因子被直接平均时,它们会相互干扰并降低全局更新的质量。为解决此问题,我们提出FedRot-LoRA,一种联邦LoRA框架,它在聚合之前通过正交变换对齐客户端更新。这种对齐保留了语义更新,同时减少了跨客户端的子空间不匹配,且不增加通信成本或限制模型表达能力。我们提供了收敛性分析,考察了逐因子平均引起的聚合误差,并展示了旋转对齐如何为该误差提供更紧的上界。在自然语言理解和生成任务上的大量实验表明,FedRot-LoRA在各种异质性水平和LoRA秩设置下,始终优于现有的联邦LoRA基线方法。