In this work, we address the challenging and emergent problem of novel object detection (NOD), focusing on the accurate detection of both known and novel object categories during inference. Traditional object detection algorithms are inherently closed-set, limiting their capability to handle NOD. We present a novel approach to transform existing closed-set detectors into open-set detectors. This transformation is achieved by leveraging the complementary strengths of pre-trained foundational models, specifically CLIP and SAM, through our cooperative mechanism. Furthermore, by integrating this mechanism with state-of-the-art open-set detectors such as GDINO, we establish new benchmarks in object detection performance. Our method achieves 17.42 mAP in novel object detection and 42.08 mAP for known objects on the challenging LVIS dataset. Adapting our approach to the COCO OVD split, we surpass the current state-of-the-art by a margin of 7.2 $ \text{AP}_{50} $ for novel classes. Our code is available at https://github.com/rohit901/cooperative-foundational-models .
翻译:本研究针对新颖物体检测这一具有挑战性的新兴问题,聚焦于在推理过程中实现对已知和未知物体类别的精确检测。传统目标检测算法本质上是封闭集的,限制了其处理新颖物体检测的能力。我们提出了一种创新方法,通过利用预训练基础模型(特别是CLIP和SAM)的互补优势,借助我们的协作机制将现有封闭集检测器转化为开放集检测器。此外,通过将该机制与GDINO等最先进的开放集检测器集成,我们在目标检测性能上树立了新的标杆。在具有挑战性的LVIS数据集上,我们的方法在新颖物体检测上达到17.42 mAP,在已知物体上达到42.08 mAP。将我们的方法应用于COCO OVD分割,我们在新类别上以7.2 $ \text{AP}_{50} $的幅度超越了当前最先进水平。我们的代码已在https://github.com/rohit901/cooperative-foundational-models 开源。