We address the task of open-world class-agnostic object detection, i.e., detecting every object in an image by learning from a limited number of base object classes. State-of-the-art RGB-based models suffer from overfitting the training classes and often fail at detecting novel-looking objects. This is because RGB-based models primarily rely on appearance similarity to detect novel objects and are also prone to overfitting short-cut cues such as textures and discriminative parts. To address these shortcomings of RGB-based object detectors, we propose incorporating geometric cues such as depth and normals, predicted by general-purpose monocular estimators. Specifically, we use the geometric cues to train an object proposal network for pseudo-labeling unannotated novel objects in the training set. Our resulting Geometry-guided Open-world Object Detector (GOOD) significantly improves detection recall for novel object categories and already performs well with only a few training classes. Using a single "person" class for training on the COCO dataset, GOOD surpasses SOTA methods by 5.0% AR@100, a relative improvement of 24%.
翻译:我们研究开放世界类无关目标检测任务,即通过从有限的基础目标类别中学习来检测图像中的每个物体。基于RGB的现有模型易在训练类别上过拟合,且常无法检测外观新颖的物体。这是因为基于RGB的模型主要依赖外观相似性检测新物体,同时易过度拟合纹理、判别性部位等捷径线索。针对RGB目标检测器的这些缺陷,我们提出整合由通用单目估计器预测的深度、法线等几何线索。具体而言,我们利用几何线索训练目标提议网络,为训练集中未标注的新颖物体生成伪标签。由此产生的几何引导开放世界目标检测器(GOOD)显著提升了对新目标类别的召回率,且仅需少量训练类别即可表现优异。在COCO数据集上仅使用单一"人"类训练时,GOOD的AR@100超越现有方法5.0%,相对提升24%。