Open-vocabulary object detectors (OVODs) unify vision and language to detect arbitrary object categories based on text prompts, enabling strong zero-shot generalization to novel concepts. As these models gain traction in high-stakes applications such as robotics, autonomous driving, and surveillance, understanding their security risks becomes crucial. In this work, we conduct the first study of backdoor attacks on OVODs and reveal a new attack surface introduced by prompt tuning. We propose TrAP (Trigger-Aware Prompt tuning), a multi-modal backdoor injection strategy that jointly optimizes prompt parameters in both image and text modalities along with visual triggers. TrAP enables the attacker to implant malicious behavior using lightweight, learnable prompt tokens without retraining the base model weights, thus preserving generalization while embedding a hidden backdoor. We adopt a curriculum-based training strategy that progressively shrinks the trigger size, enabling effective backdoor activation using small trigger patches at inference. Experiments across multiple datasets show that TrAP achieves high attack success rates for both object misclassification and object disappearance attacks, while also improving clean image performance on downstream datasets compared to the zero-shot setting. Code: https://github.com/rajankita/TrAP
翻译:开放词汇目标检测器(OVODs)通过统一视觉与语言,能够基于文本提示检测任意对象类别,从而实现对新颖概念的强大零样本泛化能力。随着此类模型在机器人、自动驾驶和监控等高风险应用中的普及,理解其安全风险变得至关重要。在本研究中,我们首次对OVODs的后门攻击进行了系统性研究,揭示了由提示调优引入的新型攻击面。我们提出TrAP(触发感知提示调优),这是一种多模态后门注入策略,通过联合优化图像与文本模态的提示参数以及视觉触发器,使攻击者能够利用轻量级可学习的提示令牌植入恶意行为,而无需重新训练基础模型权重,从而在保持泛化能力的同时嵌入隐藏后门。我们采用基于课程学习的训练策略,逐步缩小触发器尺寸,使得在推理阶段使用小型触发器补丁即可有效激活后门。跨多个数据集的实验表明,TrAP在目标误分类和对象消失攻击中均实现了高攻击成功率,同时与零样本设置相比,在下游数据集上的干净图像性能也有所提升。代码:https://github.com/rajankita/TrAP