Backdoor attacks pose a severe threat to deep learning, yet their impact on object detection remains poorly understood compared to image classification. While attacks have been proposed, we identify critical weaknesses in existing detection-based methods, specifically their reliance on unrealistic assumptions and a lack of physical validation. To bridge this gap, we introduce BadDet+, a penalty-based framework that unifies Region Misclassification Attacks (RMA) and Object Disappearance Attacks (ODA). The core mechanism utilizes a log-barrier penalty to suppress true-class predictions for triggered inputs, resulting in (i) position and scale invariance, and (ii) enhanced physical robustness. On real-world benchmarks, BadDet+ achieves superior synthetic-to-physical transfer compared to existing RMA and ODA baselines while preserving clean performance. Theoretical analysis confirms the proposed penalty acts within a trigger-specific feature subspace, reliably inducing attacks without degrading standard inference. These results highlight significant vulnerabilities in object detection and the necessity for specialized defenses.
翻译:后门攻击对深度学习构成严重威胁,然而相较于图像分类任务,其在目标检测领域的影响仍缺乏深入理解。尽管已有相关攻击方法被提出,我们发现现有基于检测的攻击方法存在关键缺陷,具体表现为对不现实假设的依赖以及缺乏物理验证。为弥补这一差距,我们提出BadDet+,一种基于惩罚机制的框架,统一了区域误分类攻击(RMA)与目标消失攻击(ODA)。其核心机制利用对数障碍惩罚函数抑制触发输入的真实类别预测,从而实现:(i)位置与尺度不变性;(ii)增强的物理鲁棒性。在真实场景基准测试中,BadDet+相较于现有RMA与ODA基线方法,在保持干净样本性能的同时,实现了更优的合成到物理场景迁移能力。理论分析证实,所提出的惩罚机制在触发特定特征子空间内起作用,能够可靠地诱导攻击而不损害标准推理性能。这些结果揭示了目标检测系统中存在的显著脆弱性,并强调了开发针对性防御措施的必要性。