Federated Prompt Learning has emerged as a communication-efficient and privacy-preserving paradigm for adapting large vision-language models like CLIP across decentralized clients. However, the security implications of this setup remain underexplored. In this work, we present the first study of backdoor attacks in Federated Prompt Learning. We show that when malicious clients inject visually imperceptible, learnable noise triggers into input images, the global prompt learner becomes vulnerable to targeted misclassification while still maintaining high accuracy on clean inputs. Motivated by this vulnerability, we propose SABRE-FL, a lightweight, modular defense that filters poisoned prompt updates using an embedding-space anomaly detector trained offline on out-of-distribution data. SABRE-FL requires no access to raw client data or labels and generalizes across diverse datasets. We show, both theoretically and empirically, that malicious clients can be reliably identified and filtered using an embedding-based detector. Across five diverse datasets and four baseline defenses, SABRE-FL outperforms all baselines by significantly reducing backdoor accuracy while preserving clean accuracy, demonstrating strong empirical performance and underscoring the need for robust prompt learning in future federated systems.
翻译:联邦提示学习已成为一种通信高效且保护隐私的范式,用于在分散的客户端上适配如CLIP等大型视觉语言模型。然而,这种设置的安全性影响仍未得到充分探索。在本工作中,我们首次研究了联邦提示学习中的后门攻击。我们表明,当恶意客户端将视觉上难以察觉、可学习的噪声触发器注入输入图像时,全局提示学习器容易受到定向错误分类的影响,同时在干净输入上仍保持高精度。基于此漏洞,我们提出了SABRE-FL,一种轻量级、模块化的防御方法,它使用在分布外数据上离线训练的嵌入空间异常检测器来过滤中毒的提示更新。SABRE-FL无需访问原始客户端数据或标签,并能泛化到不同的数据集。我们从理论和实验上证明,使用基于嵌入的检测器可以可靠地识别和过滤恶意客户端。在五个不同数据集和四种基线防御方法上,SABRE-FL通过显著降低后门准确率同时保持干净准确率,优于所有基线方法,展示了强大的实证性能,并强调了未来联邦系统中需要鲁棒的提示学习。