Open-world object detection (OWOD), as a more general and challenging goal, requires the model trained from data on known objects to detect both known and unknown objects and incrementally learn to identify these unknown objects. The existing works which employ standard detection framework and fixed pseudo-labelling mechanism (PLM) have the following problems: (i) The inclusion of detecting unknown objects substantially reduces the model's ability to detect known ones. (ii) The PLM does not adequately utilize the priori knowledge of inputs. (iii) The fixed selection manner of PLM cannot guarantee that the model is trained in the right direction. We observe that humans subconsciously prefer to focus on all foreground objects and then identify each one in detail, rather than localize and identify a single object simultaneously, for alleviating the confusion. This motivates us to propose a novel solution called CAT: LoCalization and IdentificAtion Cascade Detection Transformer which decouples the detection process via the shared decoder in the cascade decoding way. In the meanwhile, we propose the self-adaptive pseudo-labelling mechanism which combines the model-driven with input-driven PLM and self-adaptively generates robust pseudo-labels for unknown objects, significantly improving the ability of CAT to retrieve unknown objects. Comprehensive experiments on two benchmark datasets, i.e., MS-COCO and PASCAL VOC, show that our model outperforms the state-of-the-art in terms of all metrics in the task of OWOD, incremental object detection (IOD) and open-set detection.
翻译:摘要:开放世界目标检测(OWOD)作为更具一般性和挑战性的目标,要求模型利用已知目标训练数据,既能检测已知与未知目标,又能逐步学习识别这些未知目标。现有工作采用标准检测框架和固定伪标签机制(PLM),存在以下问题:(i)引入未知目标检测显著降低了模型对已知目标的检测能力;(ii)PLM未能充分利用输入的先验知识;(iii)PLM的固定选择方式无法保证模型沿正确方向训练。我们观察到人类倾向于下意识地聚焦所有前景目标,再逐一识别细节,而非同时定位与识别单个目标以缓解混淆。受此启发,我们提出一种新颖解决方案CAT:定位与识别级联检测Transformer通过级联解码方式利用共享解码器解耦检测过程。同时,我们提出自适应伪标签机制,将模型驱动与输入驱动PLM相结合,自适应地为未知目标生成鲁棒伪标签,显著提升CAT检索未知目标的能力。在MS-COCO和PASCAL VOC两个基准数据集上的综合实验表明,本模型在OWOD、增量目标检测(IOD)和开放集检测任务的所有指标上均优于现有最优方法。