Open-world object detection (OWOD), as a more general and challenging goal, requires the model trained from data on known objects to detect both known and unknown objects and incrementally learn to identify these unknown objects. The existing works which employ standard detection framework and fixed pseudo-labelling mechanism (PLM) have the following problems: (i) The inclusion of detecting unknown objects substantially reduces the model's ability to detect known ones. (ii) The PLM does not adequately utilize the priori knowledge of inputs. (iii) The fixed selection manner of PLM cannot guarantee that the model is trained in the right direction. We observe that humans subconsciously prefer to focus on all foreground objects and then identify each one in detail, rather than localize and identify a single object simultaneously, for alleviating the confusion. This motivates us to propose a novel solution called CAT: LoCalization and IdentificAtion Cascade Detection Transformer which decouples the detection process via the shared decoder in the cascade decoding way. In the meanwhile, we propose the self-adaptive pseudo-labelling mechanism which combines the model-driven with input-driven PLM and self-adaptively generates robust pseudo-labels for unknown objects, significantly improving the ability of CAT to retrieve unknown objects. Comprehensive experiments on two benchmark datasets, i.e., MS-COCO and PASCAL VOC, show that our model outperforms the state-of-the-art in terms of all metrics in the task of OWOD, incremental object detection (IOD) and open-set detection.
翻译:摘要:开放世界目标检测(OWOD)作为一个更具通用性和挑战性的目标,要求模型通过已知对象数据训练后,既能检测已知对象也能检测未知对象,并逐步学习识别这些未知对象。现有工作采用标准检测框架和固定伪标注机制(PLM),存在以下问题:(i)包含未知对象检测会显著降低模型对已知对象的检测能力;(ii)PLM未充分利用输入的先验知识;(iii)PLM的固定选择方式无法保证模型沿正确方向训练。我们观察到人类会潜意识地优先关注所有前景对象,再逐一识别每个细节,而非同时定位与识别单个对象,以缓解混淆。这启发我们提出一种创新方案——CAT:定位与识别级联检测Transformer,通过级联解码方式中的共享解码器解耦检测过程。同时,我们提出自适应伪标注机制,融合模型驱动与输入驱动型PLM,自适应生成为未知对象生成鲁棒伪标签,显著提升CAT检索未知对象的能力。在MS-COCO和PASCAL VOC两个基准数据集上的全面实验表明,本模型在OWOD、增量目标检测(IOD)和开放集检测任务中所有指标均优于当前最优方法。