Prompt learning in pretrained visual-language models has shown remarkable flexibility across various downstream tasks. Leveraging its inherent lightweight nature, recent research attempted to integrate the powerful pretrained models into federated learning frameworks to simultaneously reduce communication costs and promote local training on insufficient data. Despite these efforts, current federated prompt learning methods lack specialized designs to systematically address severe data heterogeneities, e.g., data distribution with both label and feature shifts involved. To address this challenge, we present Federated Prompts Cooperation via Optimal Transport (FedOTP), which introduces efficient collaborative prompt learning strategies to capture diverse category traits on a per-client basis. Specifically, for each client, we learn a global prompt to extract consensus knowledge among clients, and a local prompt to capture client-specific category characteristics. Unbalanced Optimal Transport is then employed to align local visual features with these prompts, striking a balance between global consensus and local personalization. Extensive experiments on datasets with various types of heterogeneities have demonstrated that our FedOTP outperforms the state-of-the-art methods.
翻译:在预训练视觉-语言模型中的提示学习已显示出跨多种下游任务的显著灵活性。鉴于其固有的轻量级特性,近期研究尝试将强大的预训练模型集成到联邦学习框架中,以同时降低通信成本并促进数据不足情况下的本地训练。然而,现有联邦提示学习方法缺乏系统性应对严重数据异质性(例如涉及标签偏移和特征偏移的数据分布)的专门设计。为解决这一挑战,我们提出了基于最优传输的联邦提示协作方法(FedOTP),该方法引入了高效的协作提示学习策略,以在每个客户端基础上捕获多样化的类别特征。具体而言,针对每个客户端,我们学习一个全局提示以提取客户端间的共识知识,以及一个局部提示以捕获客户端特定的类别特征。随后利用非平衡最优传输将局部视觉特征与这些提示对齐,在全局共识与局部个性化之间取得平衡。在具有多种异质性类型的数据集上的大量实验表明,我们的FedOTP方法优于现有最先进的方法。