Vision-language pretrained models offer strong transferable representations, yet adapting them in privacy-sensitive multi-party settings is challenging due to the high communication cost of federated optimization and the limited local data on clients. Federated prompt learning mitigates this issue by keeping the VLPM backbone frozen and collaboratively training lightweight prompt parameters. However, existing approaches typically enforce a unified prompt structure and length across clients, which is inadequate under practical client heterogeneity in both data distributions and system resources, and may further introduce conflicts between globally shared and locally optimal knowledge. To address these challenges, we propose \textbf{SDFed}, a heterogeneous federated prompt learning framework that bridges Local-Global Discrepancy via Subspace Refinement and Divergence Control. SDFed maintains a fixed-length global prompt for efficient aggregation while allowing each client to learn a variable-length local prompt to better match its data characteristics and capacity. To mitigate local-global conflicts and facilitate effective knowledge transfer, SDFed introduces a subspace refinement method for local prompts and an information retention and divergence control strategy that preserves key local information while maintaining appropriate separability between global and local representations. Extensive experiments on several datasets demonstrate that SDFed consistently improves performance and robustness in heterogeneous federated settings.
翻译:视觉语言预训练模型提供了强大的可迁移表征,然而在隐私敏感的多方场景中适配这些模型具有挑战性,原因在于联邦优化的高通信成本以及客户端本地数据的有限性。联邦提示学习通过保持视觉语言预训练模型主干网络冻结并协作训练轻量级提示参数来缓解这一问题。然而,现有方法通常强制所有客户端采用统一的提示结构和长度,这在客户端数据分布和系统资源存在实际异构性的情况下并不充分,并可能进一步引入全局共享知识与局部最优知识之间的冲突。为应对这些挑战,我们提出 \textbf{SDFed},一种异构联邦提示学习框架,通过子空间精化与差异控制来弥合局部-全局差异。SDFed 维护一个固定长度的全局提示以实现高效聚合,同时允许每个客户端学习可变长度的局部提示,以更好地匹配其数据特征与计算容量。为缓解局部-全局冲突并促进有效的知识迁移,SDFed 引入了针对局部提示的子空间精化方法,以及一种信息保留与差异控制策略,该策略在保持关键局部信息的同时,维持全局与局部表征之间适当的可分离性。在多个数据集上的大量实验表明,SDFed 在异构联邦设置中持续提升了性能与鲁棒性。