Vision-language pretrained models offer strong transferable representations, yet adapting them in privacy-sensitive multi-party settings is challenging due to the high communication cost of federated optimization and the limited local data on clients. Federated prompt learning mitigates this issue by keeping the VLPM backbone frozen and collaboratively training lightweight prompt parameters. However, existing approaches typically enforce a unified prompt structure and length across clients, which is inadequate under practical client heterogeneity in both data distributions and system resources, and may further introduce conflicts between globally shared and locally optimal knowledge. To address these challenges, we propose \textbf{SDFed}, a heterogeneous federated prompt learning framework that bridges Local-Global Discrepancy via Subspace Refinement and Divergence Control. SDFed maintains a fixed-length global prompt for efficient aggregation while allowing each client to learn a variable-length local prompt to better match its data characteristics and capacity. To mitigate local-global conflicts and facilitate effective knowledge transfer, SDFed introduces a subspace refinement method for local prompts and an information retention and divergence control strategy that preserves key local information while maintaining appropriate separability between global and local representations. Extensive experiments on several datasets demonstrate that SDFed consistently improves performance and robustness in heterogeneous federated settings.
翻译:视觉-语言预训练模型具备强大的可迁移表示能力,但在隐私敏感的多方场景中对其进行适配面临挑战,这源于联邦优化的高通信开销以及客户端本地数据有限。联邦提示学习通过冻结视觉-语言预训练模型主干并协同训练轻量级提示参数缓解了这一问题。然而,现有方法通常强制各客户端采用统一的提示结构与长度,这在客户端数据分布与系统资源均存在异质性的实际场景中并不适用,还可能引发全局共享知识与局部最优知识之间的冲突。针对上述挑战,我们提出\textbf{SDFed}——一种通过子空间优化与差异控制弥合局部-全局差异的异构联邦提示学习框架。SDFed维护固定长度的全局提示以实现高效聚合,同时允许各客户端学习可变长度的局部提示以更好地匹配其数据特征与计算能力。为缓解局部-全局冲突并促进有效的知识迁移,SDFed引入面向局部提示的子空间优化方法,以及信息保留与差异控制策略,在保留关键局部信息的同时维持全局与局部表示之间的适当可分性。在多个数据集上的大量实验表明,SDFed在异构联邦场景下持续提升了性能与鲁棒性。