Foundation models (FMs) adapt well to specific domains or tasks with fine-tuning, and federated learning (FL) enables the potential for privacy-preserving fine-tuning of the FMs with on-device local data. For federated fine-tuning of FMs, we consider the FMs with small to medium parameter sizes of single digit billion at maximum, referred to as on-device FMs (ODFMs) that can be deployed on devices for inference but can only be fine-tuned with parameter efficient methods. In our work, we tackle the data and system heterogeneity problem of federated fine-tuning of ODFMs by proposing a novel method using heterogeneous low-rank approximations (LoRAs), namely HetLoRA. First, we show that the naive approach of using homogeneous LoRA ranks across devices face a trade-off between overfitting and slow convergence, and thus propose HetLoRA, which allows heterogeneous ranks across client devices and efficiently aggregates and distributes these heterogeneous LoRA modules. By applying rank self-pruning locally and sparsity-weighted aggregation at the server, HetLoRA combines the advantages of high and low-rank LoRAs, which achieves improved convergence speed and final performance compared to homogeneous LoRA. Furthermore, HetLoRA offers enhanced computation efficiency compared to full fine-tuning, making it suitable for federated fine-tuning across heterogeneous devices.
翻译:基础模型通过微调能较好适应特定领域或任务,联邦学习则可在保护隐私的前提下利用设备端本地数据对基础模型进行微调。针对基础模型的联邦微调,本文考虑参数量在数十亿以内的小型至中型基础模型(称为设备端基础模型,ODFMs),这类模型可部署于设备端进行推理,但仅能通过参数高效方法进行微调。为应对ODFMs联邦微调中的数据与系统异构性问题,本文提出一种基于异构低秩近似(LoRA)的新方法——HetLoRA。首先,我们证明在设备间采用同质LoRA秩的朴素方法存在过拟合与收敛缓慢之间的权衡,进而提出允许客户端设备间使用异构秩的HetLoRA,该方法能高效聚合与分发这些异构LoRA模块。通过本地秩自剪枝与服务器端稀疏加权聚合,HetLoRA融合了高秩与低秩LoRA的优势,相比同质LoRA方法实现了更快的收敛速度与更优的最终性能。此外,相比于全参数微调,HetLoRA具备更高的计算效率,适用于异构设备间的联邦微调场景。