Foundation models have shown great success in natural language processing, computer vision, and multimodal tasks. FMs have a large number of model parameters, thus requiring a substantial amount of data to help optimize the model during the training. Federated learning has revolutionized machine learning by enabling collaborative learning from decentralized data while still preserving the data privacy of clients. Despite the great benefits foundation models can have empowered by federated learning, they face severe computation, communication, and statistical challenges. In this paper, we propose a novel two-stage federated learning algorithm called FedMS. A global expert is trained in the first stage and a local expert is trained in the second stage to provide better personalization. We construct a Mixture of Foundation Models (MoFM) with these two experts and design a gate neural network with an inserted gate adapter that joins the aggregation every communication round in the second stage. To further adapt to edge computing scenarios with limited computational resources, we design a novel Sparsely Activated LoRA (SAL) algorithm that freezes the pre-trained foundation model parameters inserts low-rank adaptation matrices into transformer blocks and activates them progressively during the training. We employ extensive experiments to verify the effectiveness of FedMS, results show that FedMS outperforms other SOTA baselines by up to 55.25% in default settings.
翻译:基础模型在自然语言处理、计算机视觉和多模态任务中取得了巨大成功。由于基础模型拥有大量参数,因此在训练过程中需要海量数据来帮助优化模型。联邦学习通过实现去中心化数据的协同学习,同时保护客户端的数据隐私,革新了机器学习领域。尽管联邦学习赋能基础模型带来了巨大收益,但两者结合仍面临严峻的计算、通信和统计挑战。本文提出一种名为FedMS的新型两阶段联邦学习算法:第一阶段训练全局专家,第二阶段训练本地专家以提供更优的个性化服务。我们利用这两个专家构建基础模型混合模型,并设计带有插入式门控适配器的门控神经网络,该网络在第二阶段每个通信轮次参与聚合。为进一步适应计算资源受限的边缘计算场景,我们提出新型稀疏激活低秩适应算法,该算法冻结预训练基础模型参数,在Transformer模块中插入低秩适应矩阵,并在训练过程中逐步激活这些矩阵。通过大量实验验证FedMS的有效性,结果表明在默认设置下FedMS较其他最先进基线方法性能提升高达55.25%。