Object detection (OD) is critical to real-world vision systems, yet existing backdoor attacks on detection transformers (DETRs) for OD tasks rely on patch-wise triggers optimized at fixed locations with minimal perturbations. Such attacks overlook that backdoor triggers in the real world may appear at different sizes, fields of view (FoVs), and locations in images, while minimal perturbations are difficult for cameras to capture, limiting attack practicality. We first observe that a patch-wise trigger in DETR delivers high attack effectiveness when activating the backdoor across neighboring locations, a phenomenon we term the trigger radiating effect (TRE). Meanwhile, inserting patch-wise triggers across multiple locations synergistically enhances TRE, resulting in high attack effectiveness across images. We propose DETOUR, a practical backdoor attack by using semantic triggers that are effective in real-world object detection systems. To ensure attack practicality, we rescale trigger patterns to different sizes and insert them at various predefined locations during backdoor training, enabling the model to recognize the trigger regardless of its spatial configurations. To address FoV variations in physical deployments, we extract the trigger pattern from a real-world object (e.g., a mug) captured under multiple FoVs and inject the trigger accordingly, promoting viewpoint-invariant backdoor activation and enhancing TRE across the entire image. As a result, the backdoor can be reliably activated under diverse FoVs and spatial configurations.
翻译:目标检测(OD)对真实世界的视觉系统至关重要,然而现有针对检测Transformer(DETR)的后门攻击依赖固定位置优化后的补丁触发器,且扰动极小。此类攻击忽略了真实世界中后门触发器可能以不同尺寸、视野和位置出现在图像中,同时极小扰动难以被摄像头捕捉,从而限制了攻击的实用性。我们首先发现,在DETR中,补丁触发器在跨相邻位置激活后门时具有高攻击效能,我们将此现象称为触发辐射效应(TRE)。此外,跨多个位置插入补丁触发器可协同增强TRE,从而在图像间实现高攻击效能。我们提出了DETOUR,这是一种利用语义触发器的实用后门攻击,可有效应用于真实世界的目标检测系统。为确保攻击实用性,我们在后门训练期间将触发器模式缩放至不同尺寸,并插入到多个预设位置,使模型能够识别不受空间配置影响的触发器。为应对物理部署中的视野变化,我们从多视野下捕获的真实世界物体(如马克杯)中提取触发器模式,并据此注入触发器,从而促进视角不变的后门激活,并增强整幅图像中的TRE。由此,后门可在多种视野和空间配置下可靠激活。