Object detectors often suffer from the domain gap between training (source domain) and real-world applications (target domain). Mean-teacher self-training is a powerful paradigm in unsupervised domain adaptation for object detection, but it struggles with low-quality pseudo-labels. In this work, we identify the intriguing alignment and synergy between mean-teacher self-training and contrastive learning. Motivated by this, we propose Contrastive Mean Teacher (CMT) -- a unified, general-purpose framework with the two paradigms naturally integrated to maximize beneficial learning signals. Instead of using pseudo-labels solely for final predictions, our strategy extracts object-level features using pseudo-labels and optimizes them via contrastive learning, without requiring labels in the target domain. When combined with recent mean-teacher self-training methods, CMT leads to new state-of-the-art target-domain performance: 51.9% mAP on Foggy Cityscapes, outperforming the previously best by 2.1% mAP. Notably, CMT can stabilize performance and provide more significant gains as pseudo-label noise increases.
翻译:目标检测器常因训练域(源域)与真实应用场景(目标域)之间的域差距而性能受限。均值教师自训练是无监督域自适应中目标检测任务的有效范式,但其受困于低质量伪标签。本研究发现均值教师自训练与对比学习之间存在引人入胜的对齐与协同效应。受此启发,我们提出对比均值教师(CMT)——一种统一且通用的框架,将两种范式自然融合以最大化有益学习信号。不同于仅将伪标签用于最终预测,我们的策略利用伪标签提取目标级特征,并通过对比学习优化这些特征,无需目标域标签。与近期均值教师自训练方法结合后,CMT在目标域实现了最先进的性能:在Foggy Cityscapes数据集上达到51.9%的mAP,较此前最佳结果提升2.1% mAP。值得注意的是,随着伪标签噪声增加,CMT能够稳定性能并带来更显著的增益。