In this study, we dive deep into the inconsistency of pseudo targets in semi-supervised object detection (SSOD). Our core observation is that the oscillating pseudo-targets undermine the training of an accurate detector. It injects noise into the student's training, leading to severe overfitting problems. Therefore, we propose a systematic solution, termed ConsistentTeacher, to reduce the inconsistency. First, adaptive anchor assignment~(ASA) substitutes the static IoU-based strategy, which enables the student network to be resistant to noisy pseudo-bounding boxes. Then we calibrate the subtask predictions by designing a 3D feature alignment module~(FAM-3D). It allows each classification feature to adaptively query the optimal feature vector for the regression task at arbitrary scales and locations. Lastly, a Gaussian Mixture Model (GMM) dynamically revises the score threshold of pseudo-bboxes, which stabilizes the number of ground truths at an early stage and remedies the unreliable supervision signal during training. ConsistentTeacher provides strong results on a large range of SSOD evaluations. It achieves 40.0 mAP with ResNet-50 backbone given only 10% of annotated MS-COCO data, which surpasses previous baselines using pseudo labels by around 3 mAP. When trained on fully annotated MS-COCO with additional unlabeled data, the performance further increases to 47.7 mAP. Our code is available at \url{https://github.com/Adamdad/ConsistentTeacher}.
翻译:本研究深入探讨了半监督目标检测(SSOD)中伪目标的不一致性问题。我们的核心发现是,振荡的伪目标会破坏精确检测器的训练过程,向学生网络注入噪声,导致严重的过拟合问题。为此,我们提出了一套系统性解决方案——ConsistentTeacher,以降低这种不一致性。首先,自适应锚点分配(ASA)替代了静态的IoU策略,使学生网络能够抵抗带噪声的伪边界框。其次,通过设计3D特征对齐模块(FAM-3D),我们校准了子任务预测,使每个分类特征能够自适应地在任意尺度和位置查询回归任务的最优特征向量。最后,高斯混合模型(GMM)动态修正伪边界框的分数阈值,在训练早期稳定真实标注数量,并弥补不可靠的监督信号。ConsistentTeacher在多种SSOD评估中均取得了优异结果:在仅使用10%标注的MS-COCO数据且采用ResNet-50骨干网络时,mAP达到40.0,较此前基于伪标签的基线方法提升约3 mAP;当使用全部标注的MS-COCO数据与额外未标注数据训练时,性能进一步提升至47.7 mAP。我们的代码已开源至\url{https://github.com/Adamdad/ConsistentTeacher}。