In this work, we address the challenging and emergent problem of novel object detection (NOD), focusing on the accurate detection of both known and novel object categories during inference. Traditional object detection algorithms are inherently closed-set, limiting their capability to handle NOD. We present a novel approach to transform existing closed-set detectors into open-set detectors. This transformation is achieved by leveraging the complementary strengths of pre-trained foundational models, specifically CLIP and SAM, through our cooperative mechanism. Furthermore, by integrating this mechanism with state-of-the-art open-set detectors such as GDINO, we establish new benchmarks in object detection performance. Our method achieves 17.42 mAP in novel object detection and 42.08 mAP for known objects on the challenging LVIS dataset. Adapting our approach to the COCO OVD split, we surpass the current state-of-the-art by a margin of 7.2 $ \text{AP}_{50} $ for novel classes. Our code is available at https://rohit901.github.io/coop-foundation-models/ .
翻译:在本研究中,我们解决了新物体检测这一具有挑战性的新兴问题,重点关注在推理过程中对已知和新物体类别的精确检测。传统的物体检测算法本质上是闭集的,限制了其处理新物体检测的能力。我们提出了一种新颖的方法,将现有的闭集检测器转化为开集检测器。这一转化通过我们设计的协作机制,利用预训练基础模型(特别是CLIP和SAM)的互补优势来实现。此外,通过将该机制与最先进的开集检测器(如GDINO)相结合,我们在物体检测性能上建立了新的基准。我们的方法在具有挑战性的LVIS数据集上,新物体检测达到了17.42 mAP,已知物体检测达到了42.08 mAP。将我们的方法应用于COCO OVD分割任务时,我们在新类别上以7.2 $ \text{AP}_{50} $ 的优势超越了当前的最先进水平。我们的代码可在 https://rohit901.github.io/coop-foundation-models/ 获取。