When deploying pre-trained video object detectors in real-world scenarios, the domain gap between training and testing data caused by adverse image conditions often leads to performance degradation. Addressing this issue becomes particularly challenging when only the pre-trained model and degraded videos are available. Although various source-free domain adaptation (SFDA) methods have been proposed for single-frame object detectors, SFDA for video object detection (VOD) remains unexplored. Moreover, most unsupervised domain adaptation works for object detection rely on two-stage detectors, while SFDA for one-stage detectors, which are more vulnerable to fine-tuning, is not well addressed in the literature. In this paper, we propose Spatial-Temporal Alternate Refinement with Mean Teacher (STAR-MT), a simple yet effective SFDA method for VOD. Specifically, we aim to improve the performance of the one-stage VOD method, YOLOV, under adverse image conditions, including noise, air turbulence, and haze. Extensive experiments on the ImageNetVOD dataset and its degraded versions demonstrate that our method consistently improves video object detection performance in challenging imaging conditions, showcasing its potential for real-world applications.
翻译:在实际场景中部署预训练视频目标检测器时,由不良图像条件导致的训练与测试数据之间的域差异常引发性能下降。当仅有预训练模型与退化视频可用时,解决这一问题尤为具有挑战性。尽管目前针对单帧目标检测器已提出多种无源域适应(SFDA)方法,但SFDA在视频目标检测(VOD)领域仍未被探索。此外,现有大多数目标检测的无监督域适应工作依赖于两阶段检测器,而对微调更为敏感的单阶段检测器的SFDA在文献中尚未得到充分解决。本文提出一种简单而有效的VOD无源域适应方法——时空交替精炼与均值教师(STAR-MT)。具体而言,我们旨在提升单阶段VOD方法YOLOV在噪声、空气湍流及雾霾等不良图像条件下的性能。在ImageNetVOD数据集及其退化版本上的大量实验表明,我们的方法在恶劣成像条件下能持续提升视频目标检测性能,展示了其在真实应用场景中的潜力。