Prompt-driven Video Segmentation Foundation Models (VSFMs), such as SAM2, are increasingly used in applications including autonomous driving and digital pathology, yet their security risks remain underexplored. We study backdoor attacks against VSFMs and show that directly applying classic attacks such as BadNet is largely ineffective, yielding attack success rates (ASR) below 5%. Through gradient-similarity and attention-map analyses, we find that traditional backdoor training fails because clean and triggered samples induce aligned image-encoder gradients, while model attention remains focused on the prompt-specified object rather than the trigger. To address this limitation, we propose BadVSFM, the first backdoor attack framework tailored to prompt-driven VSFMs. BadVSFM uses a two-stage strategy that first learns trigger-specific encoder features and then trains the decoder to map triggered frame prompt representations to an attacker-specified target mask while preserving clean segmentation behavior. Experiments on five VSFMs and two datasets show that BadVSFM achieves strong, controllable backdoor effects across triggers and prompt types with limited clean-performance degradation. Ablations and interpretability analyses validate the necessity of the two-stage design, and five representative defenses remain largely ineffective. Our results reveal a practical and underexplored vulnerability of current VSFMs to backdoor threats.
翻译:提示驱动的视频分割基础模型(如SAM2)正越来越多地应用于自动驾驶和数字病理学等场景,然而其安全风险仍未被充分探索。我们针对视频分割基础模型研究了后门攻击,并发现直接应用BadNet等经典攻击方法基本无效,攻击成功率低于5%。通过梯度相似性和注意力图分析,我们发现传统后门训练失效的原因在于:干净样本与触发样本会引发对齐的图像编码器梯度,同时模型注意力仍集中于提示指定的物体而非触发器。为解决这一局限,我们提出BadVSFM——首个专为提示驱动视频分割基础模型设计的后门攻击框架。BadVSFM采用两阶段策略:首先学习与触发器相关的编码器特征,接着训练解码器将触发帧的提示表征映射为攻击者指定的目标掩码,同时保持干净分割行为。在五种视频分割基础模型和两个数据集上的实验表明,BadVSFM能在不同触发器和提示类型下实现强效可控的后门效果,且对干净性能的退化有限。消融实验与可解释性分析验证了两阶段设计的必要性,五种代表性防御手段也基本无效。我们的结果揭示了当前视频分割基础模型面临的实际且尚未充分探索的后门威胁脆弱性。