Unsupervised Domain Adaptive Object Detection (UDA-OD) uses unlabelled data to improve the reliability of robotic vision systems in open-world environments. Previous approaches to UDA-OD based on self-training have been effective in overcoming changes in the general appearance of images. However, shifts in a robot's deployment environment can also impact the likelihood that different objects will occur, termed class distribution shift. Motivated by this, we propose a framework for explicitly addressing class distribution shift to improve pseudo-label reliability in self-training. Our approach uses the domain invariance and contextual understanding of a pre-trained joint vision and language model to predict the class distribution of unlabelled data. By aligning the class distribution of pseudo-labels with this prediction, we provide weak supervision of pseudo-label accuracy. To further account for low quality pseudo-labels early in self-training, we propose an approach to dynamically adjust the number of pseudo-labels per image based on model confidence. Our method outperforms state-of-the-art approaches on several benchmarks, including a 4.7 mAP improvement when facing challenging class distribution shift.
翻译:无监督领域自适应目标检测(UDA-OD)利用无标注数据提升机器视觉系统在开放环境中的可靠性。基于自训练机制的现有UDA-OD方法在克服图像整体外观变化方面已取得显著成效。然而,机器人部署场景的变迁同样会改变各类目标出现的概率,我们将这种现象称为类别分布偏移。受此启发,我们提出一种显式应对类别分布偏移的框架,以提升自训练过程中伪标签的可靠性。该方法利用预训练的视觉-语言联合模型在领域不变性与语境理解方面的能力,预测无标注数据的类别分布。通过使伪标签的类别分布与预测分布对齐,实现对伪标签准确性的弱监督。针对自训练初期伪标签质量较低的问题,我们进一步提出基于模型置信度动态调整每张图像伪标签数量的策略。在多个基准测试中,本方法均超越当前最优技术,尤其在面对显著类别分布偏移时,平均精度(mAP)提升达4.7%。