The challenge of Out-Of-Distribution (OOD) robustness remains a critical hurdle towards deploying deep vision models. Vision-Language Models (VLMs) have recently achieved groundbreaking results. VLM-based open-vocabulary object detection extends the capabilities of traditional object detection frameworks, enabling the recognition and classification of objects beyond predefined categories. Investigating OOD robustness in recent open-vocabulary object detection is essential to increase the trustworthiness of these models. This study presents a comprehensive robustness evaluation of the zero-shot capabilities of three recent open-vocabulary (OV) foundation object detection models: OWL-ViT, YOLO World, and Grounding DINO. Experiments carried out on the robustness benchmarks COCO-O, COCO-DC, and COCO-C encompassing distribution shifts due to information loss, corruption, adversarial attacks, and geometrical deformation, highlighting the challenges of the model's robustness to foster the research for achieving robustness. Project page: https://prakashchhipa.github.io/projects/ovod_robustness
翻译:分布外(OOD)鲁棒性挑战仍然是部署深度视觉模型的关键障碍。视觉-语言模型(VLMs)近期取得了突破性进展。基于VLM的开放词汇目标检测扩展了传统目标检测框架的能力,使其能够识别和分类超出预定义类别的物体。研究近期开放词汇目标检测中的OOD鲁棒性对于提升这些模型的可信度至关重要。本研究对三种近期开放词汇(OV)基础目标检测模型(OWL-ViT、YOLO World和Grounding DINO)的零样本能力进行了全面的鲁棒性评估。实验在鲁棒性基准测试集COCO-O、COCO-DC和COCO-C上进行,这些数据集涵盖了由信息丢失、数据损坏、对抗攻击和几何形变引起的分布偏移,凸显了模型鲁棒性所面临的挑战,以推动实现鲁棒性的相关研究。项目页面:https://prakashchhipa.github.io/projects/ovod_robustness