Federated learning is a distributed machine learning paradigm that allows multiple clients to collaboratively train a shared model with their local data. Nonetheless, conventional federated learning algorithms often struggle to generalize well due to the ubiquitous domain shift across clients. In this work, we consider a challenging yet realistic federated learning scenario where the training data of each client originates from different domains. We address the challenges of domain shift by leveraging the technique of prompt learning, and propose a novel method called Federated Dual Prompt Tuning (Fed-DPT). Specifically, Fed-DPT employs a pre-trained vision-language model and then applies both visual and textual prompt tuning to facilitate domain adaptation over decentralized data. Extensive experiments of Fed-DPT demonstrate its significant effectiveness in domain-aware federated learning. With a pre-trained CLIP model (ViT-Base as image encoder), the proposed Fed-DPT attains 68.4% average accuracy over six domains in the DomainNet dataset, which improves the original CLIP by a large margin of 14.8%.
翻译:联邦学习是一种分布式机器学习范式,允许多个客户端利用本地数据协同训练共享模型。然而,由于客户端间普遍存在的域偏移,传统联邦学习算法通常难以实现良好的泛化能力。本文考虑了一个具有挑战性且现实场景的联邦学习问题,其中每个客户端的训练数据来自不同领域。我们通过利用提示学习技术应对域偏移挑战,提出了一种名为联邦双提示调优(Fed-DPT)的新型方法。具体而言,Fed-DPT采用预训练的视觉-语言模型,并同时应用视觉和文本提示调优来促进去中心化数据上的域适应。大量实验表明,Fed-DPT在域感知联邦学习中具有显著有效性。基于预训练的CLIP模型(以ViT-Base作为图像编码器),所提出的Fed-DPT在DomainNet数据集上的六个域中取得了68.4%的平均准确率,相较于原始CLIP提升了14.8%的显著幅度。