Benchmarking Adversarial Robustness and Adversarial Training Strategies for Object Detection

Object detection models are critical components of automated systems, such as autonomous vehicles and perception-based robots, but their sensitivity to adversarial attacks poses a serious security risk. Progress in defending these models lags behind classification, hindered by a lack of standardized evaluation. It is nearly impossible to thoroughly compare attack or defense methods, as existing work uses different datasets, inconsistent efficiency metrics, and varied measures of perturbation cost. This paper addresses this gap by investigating three key questions: (1) How can we create a fair benchmark to impartially compare attacks? (2) How well do modern attacks transfer across different architectures, especially from Convolutional Neural Networks to Vision Transformers? (3) What is the most effective adversarial training strategy for robust defense? To answer these, we first propose a unified benchmark framework focused on digital, non-patch-based attacks. This framework introduces specific metrics to disentangle localization and classification errors and evaluates attack cost using multiple perceptual metrics. Using this benchmark, we conduct extensive experiments on state-of-the-art attacks and a wide range of detectors. Our findings reveal two major conclusions: first, modern adversarial attacks against object detection models show a significant lack of transferability to transformer-based architectures. Second, we demonstrate that the most robust adversarial training strategy leverages a dataset composed of a mix of high-perturbation attacks with different objectives (e.g., spatial and semantic), which outperforms training on any single attack.

翻译：目标检测模型是自动驾驶汽车和基于感知的机器人等自动化系统的关键组成部分，但其对对抗攻击的敏感性构成了严重的安全风险。由于缺乏标准化评估，这些模型的防御进展落后于分类任务，使得全面比较攻击或防御方法几乎不可能，因为现有研究使用了不同的数据集、不一致的效率指标以及多样化的扰动成本度量。本文通过研究三个关键问题来弥补这一差距：（1）如何创建一个公平的基准来公正地比较攻击方法？（2）现代攻击在不同架构间的迁移性如何，特别是从卷积神经网络到视觉Transformer？（3）最有效的对抗训练策略是什么？为此，我们首先提出了一个专注于数字、非基于补丁攻击的统一基准框架。该框架引入了特定指标以区分定位和分类误差，并使用多种感知指标评估攻击成本。基于此基准，我们对最先进的攻击方法和多种检测器进行了大量实验。我们的研究结果揭示了两个主要结论：首先，针对目标检测模型的现代对抗攻击在向基于Transformer的架构迁移时表现出显著的不足。其次，我们证明了最鲁棒的对抗训练策略利用了由具有不同目标（例如空间和语义）的高扰动攻击混合组成的数据集，其性能优于任何单一攻击的训练。