Prompt learning in pretrained visual-language models has shown remarkable flexibility across various downstream tasks. Leveraging its inherent lightweight nature, recent research attempted to integrate the powerful pretrained models into federated learning frameworks to simultaneously reduce communication costs and promote local training on insufficient data. Despite these efforts, current federated prompt learning methods lack specialized designs to systematically address severe data heterogeneities, e.g., data distribution with both label and feature shifts involved. To address this challenge, we present Federated Prompts Cooperation via Optimal Transport (FedOTP), which introduces efficient collaborative prompt learning strategies to capture diverse category traits on a per-client basis. Specifically, for each client, we learn a global prompt to extract consensus knowledge among clients, and a local prompt to capture client-specific category characteristics. Unbalanced Optimal Transport is then employed to align local visual features with these prompts, striking a balance between global consensus and local personalization. By relaxing one of the equality constraints, FedOTP enables prompts to focus solely on the core regions of image patches. Extensive experiments on datasets with various types of heterogeneities have demonstrated that our FedOTP outperforms the state-of-the-art methods.
翻译:在预训练视觉-语言模型中的提示学习已在各种下游任务中展现出显著的灵活性。利用其固有的轻量级特性,近期研究尝试将强大的预训练模型集成到联邦学习框架中,以同时降低通信成本并促进数据不足情况下的本地训练。尽管已有这些努力,当前的联邦提示学习方法缺乏专门设计来系统性地应对严重的数据异质性,例如同时涉及标签偏移和特征偏移的数据分布。为解决这一挑战,我们提出了基于最优传输的联邦提示协作方法(FedOTP),该方法引入了高效的协作式提示学习策略,以在逐个客户端的基础上捕获多样化的类别特征。具体而言,对于每个客户端,我们学习一个全局提示以提取客户端间的共识知识,以及一个局部提示以捕获客户端特定的类别特征。随后采用非平衡最优传输将局部视觉特征与这些提示对齐,在全局共识与局部个性化之间取得平衡。通过放松其中一个等式约束,FedOTP使提示能够仅聚焦于图像块的核心区域。在具有各类异质性的数据集上进行的广泛实验表明,我们的FedOTP优于当前最先进的方法。