Open-world object detection (OWOD), as a more general and challenging goal, requires the model trained from data on known objects to detect both known and unknown objects and incrementally learn to identify these unknown objects. The existing works which employ standard detection framework and fixed pseudo-labelling mechanism (PLM) have the following problems: (i) The inclusion of detecting unknown objects substantially reduces the model's ability to detect known ones. (ii) The PLM does not adequately utilize the priori knowledge of inputs. (iii) The fixed selection manner of PLM cannot guarantee that the model is trained in the right direction. We observe that humans subconsciously prefer to focus on all foreground objects and then identify each one in detail, rather than localize and identify a single object simultaneously, for alleviating the confusion. This motivates us to propose a novel solution called CAT: LoCalization and IdentificAtion Cascade Detection Transformer which decouples the detection process via the shared decoder in the cascade decoding way. In the meanwhile, we propose the self-adaptive pseudo-labelling mechanism which combines the model-driven with input-driven PLM and self-adaptively generates robust pseudo-labels for unknown objects, significantly improving the ability of CAT to retrieve unknown objects. Comprehensive experiments on two benchmark datasets, i.e., MS-COCO and PASCAL VOC, show that our model outperforms the state-of-the-art in terms of all metrics in the task of OWOD, incremental object detection (IOD) and open-set detection.
翻译:开放世界目标检测(OWOD)作为一个更具普适性和挑战性的目标,要求模型从已知目标的数据中训练,能够同时检测已知和未知目标,并增量学习识别这些未知目标。现有工作采用标准检测框架和固定伪标签机制(PLM),存在以下问题:(i)检测未知目标的引入显著降低了模型检测已知目标的能力;(ii)PLM未能充分利用输入的先验知识;(iii)PLM的固定选择方式无法保证模型沿正确方向训练。我们观察到,人类下意识倾向于先聚焦所有前景目标,再逐一识别每个目标的细节,而非同时定位和识别单个目标,以减轻认知混淆。这启发我们提出一种新颖解决方案——CAT:定位与识别级联检测Transformer,通过共享解码器以级联解码方式解耦检测过程。同时,我们提出自适应伪标签机制,将模型驱动与输入驱动的PLM相结合,自适应地为未知目标生成鲁棒的伪标签,显著提升CAT检索未知目标的能力。在两个基准数据集(即MS-COCO和PASCAL VOC)上的全面实验表明,我们的模型在OWOD、增量目标检测(IOD)和开放集检测任务中,所有评估指标均优于现有最优方法。