FLoRIST: Singular Value Thresholding for Efficient and Accurate Federated Fine-Tuning of Large Language Models

Integrating Low-Rank Adaptation (LoRA) into federated learning offers a promising solution for parameter-efficient fine-tuning of Large Language Models (LLMs) without sharing local data. However, several methods designed for federated LoRA present significant challenges in balancing communication efficiency, model accuracy, and computational cost, particularly among heterogeneous clients. These methods either rely on simplistic averaging of local adapters, which introduces aggregation noise, require transmitting large stacked local adapters, leading to poor communication efficiency, or necessitate reconstructing memory-dense global weight-update matrix and performing computationally expensive decomposition to design client-specific low-rank adapters. In this work, we propose FLoRIST, a federated fine-tuning framework that achieves mathematically accurate aggregation without incurring high communication or computational overhead. Instead of constructing the full global weight-update matrix at the server, FLoRIST employs an efficient decomposition pipeline by performing singular value decomposition on stacked local adapters separately. This approach operates within a compact intermediate space to represent the accumulated information from local LoRAs. We introduce tunable singular value thresholding for server-side optimal rank selection to construct a pair of global low-rank adapters shared by all clients. Extensive empirical evaluations across multiple datasets and LLMs demonstrate that FLoRIST consistently strikes the best balance between superior communication efficiency and competitive performance in both homogeneous and heterogeneous setups.

翻译：摘要：将低秩适配（LoRA）融入联邦学习，为在不共享本地数据的情况下参数高效微调大语言模型（LLMs）提供了有前景的解决方案。然而，面向联邦LoRA设计的多种方法在平衡通信效率、模型准确性和计算成本方面面临显著挑战，尤其当客户端异构时。这些方法要么依赖简单的本地适配器平均化（引入聚合噪声），要么需传输大型堆叠的本地适配器（导致通信效率低下），要么需重构内存密集的全局权重更新矩阵并执行高计算成本的分解以设计客户端专属的低秩适配器。本文提出FLoRIST——一种联邦微调框架，可在不产生高通信或计算开销的情况下实现数学上精确的聚合。FLoRIST无需在服务器端构建完整的全局权重更新矩阵，而是通过分别对堆叠的本地适配器执行奇异值分解（SVD），采用高效分解流水线。该方法在紧凑的中间空间中操作，以表示来自本地LoRA的累积信息。我们引入可调奇异值阈值进行服务器端最优秩选择，从而构建一组所有客户端共享的全局低秩适配器。跨多个数据集和LLMs的大量实验表明，FLoRIST在同构与异构场景下均能始终如一地实现卓越通信效率与竞争性性能的最佳平衡。