Open-world object detection (OWOD), as a more general and challenging goal, requires the model trained from data on known objects to detect both known and unknown objects and incrementally learn to identify these unknown objects. The existing works which employ standard detection framework and fixed pseudo-labelling mechanism (PLM) have the following problems: (i) The inclusion of detecting unknown objects substantially reduces the model's ability to detect known ones. (ii) The PLM does not adequately utilize the priori knowledge of inputs. (iii) The fixed selection manner of PLM cannot guarantee that the model is trained in the right direction. We observe that humans subconsciously prefer to focus on all foreground objects and then identify each one in detail, rather than localize and identify a single object simultaneously, for alleviating the confusion. This motivates us to propose a novel solution called CAT: LoCalization and IdentificAtion Cascade Detection Transformer which decouples the detection process via the shared decoder in the cascade decoding way. In the meanwhile, we propose the self-adaptive pseudo-labelling mechanism which combines the model-driven with input-driven PLM and self-adaptively generates robust pseudo-labels for unknown objects, significantly improving the ability of CAT to retrieve unknown objects. Comprehensive experiments on two benchmark datasets, i.e., MS-COCO and PASCAL VOC, show that our model outperforms the state-of-the-art in terms of all metrics in the task of OWOD, incremental object detection (IOD) and open-set detection.
翻译:摘要:开放世界目标检测(OWOD)作为更具一般性和挑战性的目标,要求模型从已知目标数据中训练后,既能检测已知和未知目标,又能增量式地学习识别这些未知目标。现有工作采用标准检测框架与固定伪标签机制(PLM),存在以下问题:(i)检测未知目标的引入显著削弱了模型对已知目标的检测能力;(ii)PLM未能充分利用输入的先验知识;(iii)PLM的固定选择方式无法保证模型沿正确方向训练。我们观察到,人类会下意识地优先关注所有前景目标,再逐一进行细致识别,而非同时定位与识别单一目标,以此减轻认知混淆。受此启发,我们提出创新解决方案CAT:定位与识别级联检测Transformer,通过级联解码方式在共享解码器中解耦检测过程。同时,我们提出自适应伪标签机制,融合模型驱动与输入驱动的PLM,自适应地为未知目标生成鲁棒伪标签,显著提升CAT对未知目标的检索能力。在MS-COCO和PASCAL VOC两个基准数据集上的全面实验表明,本模型在OWOD、增量目标检测(IOD)和开放集检测任务中,所有指标均超越当前最优水平。