Previous object detectors make predictions based on dense grid points or numerous preset anchors. Most of these detectors are trained with one-to-many label assignment strategies. On the contrary, recent query-based object detectors depend on a sparse set of learnable queries and a series of decoder layers. The one-to-one label assignment is independently applied on each layer for the deep supervision during training. Despite the great success of query-based object detection, however, this one-to-one label assignment strategy demands the detectors to have strong fine-grained discrimination and modeling capacity. To solve the above problems, in this paper, we propose a new query-based object detector with cross-stage interaction, coined as StageInteractor. During the forward propagation, we come up with an efficient way to improve this modeling ability by reusing dynamic operators with lightweight adapters. As for the label assignment, a cross-stage label assigner is applied subsequent to the one-to-one label assignment. With this assigner, the training target class labels are gathered across stages and then reallocated to proper predictions at each decoder layer. On MS COCO benchmark, our model improves the baseline by 2.2 AP, and achieves 44.8 AP with ResNet-50 as backbone, 100 queries and 12 training epochs. With longer training time and 300 queries, StageInteractor achieves 51.1 AP and 52.2 AP with ResNeXt-101-DCN and Swin-S, respectively.
翻译:以往的目标检测器基于密集网格点或大量预设锚框进行预测,多数检测器采用一对多标签分配策略进行训练。与此不同,近期基于查询的目标检测器依赖稀疏的可学习查询集及多层解码器结构。在训练过程中,每层独立应用一对一标签分配以实现深度监督。尽管基于查询的目标检测取得了显著成功,但该一对一标签分配策略要求检测器具备强大的细粒度判别与建模能力。为解决上述问题,本文提出一种新型的跨阶段交互查询式目标检测器StageInteractor。在前向传播中,我们通过轻量适配器复用动态算子以高效提升建模能力。在标签分配方面,我们在完成一对一标签分配后引入跨阶段标签分配器:该分配器跨阶段收集训练目标类别标签,并重新分配给各解码层的对应预测结果。在MS COCO基准测试中,本模型以ResNet-50为骨干网络、100个查询及12轮训练周期时,相较基线提升2.2 AP,达到44.8 AP。延长训练时间并采用300个查询后,StageInteractor分别以ResNeXt-101-DCN和Swin-S骨干网络实现51.1 AP和52.2 AP。