基于原型的提示估计用于视觉Transformer的联邦提示调优 (Prompt Estimation from Prototypes for Federated Prompt Tuning of Vision Transformers)

Visual Prompt Tuning (VPT) of pre-trained Vision Transformers (ViTs) has proven highly effective as a parameter-efficient fine-tuning technique for adapting large models to downstream tasks with limited data. Its parameter efficiency makes it particularly suitable for Federated Learning (FL), where both communication and computation budgets are often constrained. However, global prompt tuning struggles to generalize across heterogeneous clients, while personalized tuning overfits to local data and lacks generalization. We propose PEP-FedPT (Prompt Estimation from Prototypes for Federated Prompt Tuning), a unified framework designed to achieve both generalization and personalization in federated prompt tuning of ViTs. Within this framework, we introduce the novel Class-Contextualized Mixed Prompt (CCMP) - based on class-specific prompts maintained alongside a globally shared prompt. For each input, CCMP adaptively combines class-specific prompts using weights derived from global class prototypes and client class priors. This approach enables per-sample prompt personalization without storing client-dependent trainable parameters. The prompts are collaboratively optimized via traditional federated averaging technique on the same. Comprehensive evaluations on CIFAR-100, TinyImageNet, DomainNet, and iNaturalist datasets demonstrate that PEP-FedPT consistently surpasses the state-of-the-art baselines under diverse data heterogeneity scenarios, establishing a strong foundation for efficient and generalizable federated prompt tuning of Vision Transformers.

翻译：预训练视觉Transformer（ViTs）的视觉提示调优（VPT）已被证明是一种高效的参数微调技术，能够以有限的参数量将大型模型适配到下游任务中。其参数效率使其特别适用于联邦学习（FL）场景，因为联邦学习中通信和计算资源通常受限。然而，全局提示调优难以在异构客户端间泛化，而个性化调优则容易过拟合本地数据且缺乏泛化能力。我们提出了PEP-FedPT（基于原型的提示估计用于联邦提示调优），这是一个为ViTs联邦提示调优实现泛化与个性化统一的框架。在该框架中，我们引入了新颖的类上下文混合提示（CCMP）——基于与全局共享提示并存的类特定提示。对于每个输入，CCMP利用源自全局类原型和客户端类先验的权重自适应地组合类特定提示。这种方法实现了无需存储客户端相关可训练参数的逐样本个性化提示。提示通过传统联邦平均技术在相同框架下协同优化。在CIFAR-100、TinyImageNet、DomainNet和iNaturalist数据集上的综合评估表明，PEP-FedPT在不同数据异构场景下持续超越现有最先进的基线方法，为视觉Transformer的高效可泛化联邦提示调优奠定了坚实基础。