FeDeRA:Efficient Fine-tuning of Language Models in Federated Learning Leveraging Weight Decomposition

Pre-trained Language Models (PLMs) have shown excellent performance on various downstream tasks after fine-tuning. Nevertheless, the escalating concerns surrounding user privacy have posed significant challenges to centralized training reliant on extensive data collection. Federated learning, which only requires training on the clients and aggregates weights on the server without sharing data, has emerged as a solution. However, the substantial parameter size of PLMs places a significant burden on the computational resources of client devices, while also leading to costly communication expenses. Introducing Parameter-Efficient Fine-Tuning(PEFT) into federated learning can effectively address this problem. However, we observe that the non-IID data in federated learning leads to a gap in performance between the PEFT method and full parameter fine-tuning(FFT). To overcome this, we propose FeDeRA, an improvement over the Low-Rank Adaption(LoRA) method in federated learning. FeDeRA uses the same adapter module as LoRA. However, the difference lies in FeDeRA's initialization of the adapter module by performing Singular Value Decomposition (SVD) on the pre-trained matrix and selecting its principal components. We conducted extensive experiments, using RoBERTa and DeBERTaV3, on six datasets, comparing the methods including FFT and the other three different PEFT methods. FeDeRA outperforms all other PEFT methods and is comparable to or even surpasses the performance of FFT method. We also deployed federated learning on Jetson AGX Orin and compared the time required by different methods to achieve the target accuracy on specific tasks. Compared to FFT, FeDeRA reduces the training time by 95.9\%, 97.9\%, 96.9\% and 97.3\%, 96.5\%, 96.5\% respectively on three tasks using RoBERTa and DeBERTaV3. The overall experiments indicate that FeDeRA achieves good performance while also maintaining efficiency.

翻译：预训练语言模型(PLMs)在微调后展现了卓越的下游任务性能。然而,用户隐私问题的日益凸显对依赖大规模数据收集的集中式训练提出了严峻挑战。联邦学习仅需在客户端进行训练并在服务器端聚合权重而无需共享数据,已成为一种解决方案。但PLMs庞大的参数量给客户端设备的计算资源造成沉重负担,同时导致高昂的通信成本。将参数高效微调(PEFT)引入联邦学习可有效解决该问题。然而我们观察到,联邦学习中的非独立同分布数据导致PEFT方法与全参数微调(FFT)之间存在性能差距。为克服该问题,我们提出FeDeRA方法,这是对联邦学习中低秩适应(LoRA)方法的改进。FeDeRA采用与LoRA相同的适配器模块,其差异在于:FeDeRA通过对预训练矩阵进行奇异值分解(SVD)并选取主成分来初始化适配器模块。我们在六个数据集上使用RoBERTa和DeBERTaV3进行了广泛实验,对比了FFT及其他三种PEFT方法。FeDeRA性能优于所有其他PEFT方法,可媲美甚至超越FFT方法。我们还在Jetson AGX Orin设备上部署联邦学习,比较了不同方法在特定任务上达到目标精度所需时间。与FFT相比,FeDeRA在使用RoBERTa和DeBERTaV3的三个任务上分别减少训练时间95.9%、97.9%、96.9%及97.3%、96.5%、96.5%。整体实验表明,FeDeRA在保持高效性的同时实现了优异性能。