COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts

Practical object detection application can lose its effectiveness on image inputs with natural distribution shifts. This problem leads the research community to pay more attention on the robustness of detectors under Out-Of-Distribution (OOD) inputs. Existing works construct datasets to benchmark the detector's OOD robustness for a specific application scenario, e.g., Autonomous Driving. However, these datasets lack universality and are hard to benchmark general detectors built on common tasks such as COCO. To give a more comprehensive robustness assessment, we introduce COCO-O(ut-of-distribution), a test dataset based on COCO with 6 types of natural distribution shifts. COCO-O has a large distribution gap with training data and results in a significant 55.7% relative performance drop on a Faster R-CNN detector. We leverage COCO-O to conduct experiments on more than 100 modern object detectors to investigate if their improvements are credible or just over-fitting to the COCO test set. Unfortunately, most classic detectors in early years do not exhibit strong OOD generalization. We further study the robustness effect on recent breakthroughs of detector's architecture design, augmentation and pre-training techniques. Some empirical findings are revealed: 1) Compared with detection head or neck, backbone is the most important part for robustness; 2) An end-to-end detection transformer design brings no enhancement, and may even reduce robustness; 3) Large-scale foundation models have made a great leap on robust object detection. We hope our COCO-O could provide a rich testbed for robustness study of object detection. The dataset will be available at https://github.com/alibaba/easyrobust/tree/main/benchmarks/coco_o.

翻译：实际的目标检测应用在遭遇自然分布偏移的图像输入时，其有效性可能显著下降。这一问题促使研究界更加关注检测器在分布外（Out-Of-Distribution, OOD）输入下的鲁棒性。现有工作针对特定应用场景（如自动驾驶）构建数据集以评估检测器的OOD鲁棒性，但这些数据集缺乏通用性，难以对基于通用任务（如COCO）构建的常规检测器进行基准测试。为提供更全面的鲁棒性评估，我们提出了COCO-O（分布外测试集），该数据集基于COCO构建，包含6种类型的自然分布偏移。COCO-O与训练数据存在显著分布差异，导致Faster R-CNN检测器的性能相对下降达55.7%。我们利用COCO-O对超过100种现代目标检测器开展实验，以探究其性能提升是否可信，抑或仅是对COCO测试集的过拟合。遗憾的是，早期多数经典检测器并未展现出强大的OOD泛化能力。我们进一步研究了检测器架构设计、数据增强及预训练技术等近期突破对鲁棒性的影响，并揭示了若干实证发现：1）相较于检测头或颈部网络，骨干网络是鲁棒性最关键的部分；2）端到端检测Transformer设计并未带来增强，甚至可能削弱鲁棒性；3）大规模基础模型在鲁棒性目标检测领域取得了重大进展。我们期望COCO-O能为目标检测的鲁棒性研究提供丰富的测试平台。该数据集将发布于https://github.com/alibaba/easyrobust/tree/main/benchmarks/coco_o。