Unsupervised domain adaptive (UDA) algorithms can markedly enhance the performance of object detectors under conditions of domain shifts, thereby reducing the necessity for extensive labeling and retraining. Current domain adaptive object detection algorithms primarily cater to two-stage detectors, which tend to offer minimal improvements when directly applied to single-stage detectors such as YOLO. Intending to benefit the YOLO detector from UDA, we build a comprehensive domain adaptive architecture using a teacher-student cooperative system for the YOLO detector. In this process, we propose uncertainty learning to cope with pseudo-labeling generated by the teacher model with extreme uncertainty and leverage dynamic data augmentation to asymptotically adapt the teacher-student system to the environment. To address the inability of single-stage object detectors to align at multiple stages, we utilize a unified visual contrastive learning paradigm that aligns instance at backbone and head respectively, which steadily improves the robustness of the detectors in cross-domain tasks. In summary, we present an unsupervised domain adaptive YOLO detector based on visual contrastive learning (CLDA-YOLO), which achieves highly competitive results across multiple domain adaptive datasets without any reduction in inference speed.
翻译:无监督域自适应算法能够显著提升目标检测器在域偏移条件下的性能,从而减少大量标注与重新训练的需求。当前的域自适应目标检测算法主要面向两阶段检测器,若直接应用于YOLO等单阶段检测器,其提升效果往往有限。为使YOLO检测器受益于无监督域自适应,我们构建了一个基于师生协作系统的完整域自适应架构。在此过程中,我们提出不确定性学习以应对教师模型生成的高不确定性伪标签,并利用动态数据增强使师生系统逐步适应目标环境。针对单阶段目标检测器无法进行多阶段对齐的问题,我们采用统一的视觉对比学习范式,分别在骨干网络与检测头层面对实例进行对齐,从而稳步提升检测器在跨域任务中的鲁棒性。综上所述,我们提出了一种基于视觉对比学习的无监督域自适应YOLO检测器,该模型在多个域自适应数据集上取得了极具竞争力的结果,且推理速度未受任何影响。