COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts

Practical object detection application can lose its effectiveness on image inputs with natural distribution shifts. This problem leads the research community to pay more attention on the robustness of detectors under Out-Of-Distribution (OOD) inputs. Existing works construct datasets to benchmark the detector's OOD robustness for a specific application scenario, e.g., Autonomous Driving. However, these datasets lack universality and are hard to benchmark general detectors built on common tasks such as COCO. To give a more comprehensive robustness assessment, we introduce COCO-O(ut-of-distribution), a test dataset based on COCO with 6 types of natural distribution shifts. COCO-O has a large distribution gap with training data and results in a significant 55.7% relative performance drop on a Faster R-CNN detector. We leverage COCO-O to conduct experiments on more than 100 modern object detectors to investigate if their improvements are credible or just over-fitting to the COCO test set. Unfortunately, most classic detectors in early years do not exhibit strong OOD generalization. We further study the robustness effect on recent breakthroughs of detector's architecture design, augmentation and pre-training techniques. Some empirical findings are revealed: 1) Compared with detection head or neck, backbone is the most important part for robustness; 2) An end-to-end detection transformer design brings no enhancement, and may even reduce robustness; 3) Large-scale foundation models have made a great leap on robust object detection. We hope our COCO-O could provide a rich testbed for robustness study of object detection. The dataset will be available at \url{https://github.com/alibaba/easyrobust/tree/main/benchmarks/coco_o}.

翻译：实际目标检测应用在遭遇自然分布偏移的图像输入时，其有效性可能显著下降。该问题促使研究界更加关注检测器在分布外（Out-Of-Distribution, OOD）输入下的鲁棒性。现有研究通常针对特定应用场景（如自动驾驶）构建数据集以评估检测器的OOD鲁棒性，但这些数据集缺乏普适性，难以评估基于通用任务（如COCO）构建的检测器。为提供更全面的鲁棒性评估，我们提出COCO-O（分布外数据集），这是基于COCO构建的测试数据集，包含6种自然分布偏移类型。COCO-O与训练数据间存在显著分布差异，导致Faster R-CNN检测器的性能相对下降高达55.7%。我们利用COCO-O对超过100种现代目标检测器进行实验，探究其性能提升是否真实可信，抑或仅为对COCO测试集的过拟合。遗憾的是，早期大多数经典检测器并未展现出强OOD泛化能力。我们进一步研究检测器架构设计、数据增强及预训练技术等近期突破对鲁棒性的影响，揭示了若干实证发现：1）相比检测头或颈部网络，骨干网络对鲁棒性最为关键；2）端到端检测Transformer架构并未带来增强，甚至可能降低鲁棒性；3）大规模基础模型显著推动鲁棒目标检测的发展。我们期望COCO-O能为目标检测鲁棒性研究提供丰富的测试平台。数据集将发布于 \url{https://github.com/alibaba/easyrobust/tree/main/benchmarks/coco_o}。