Adapting visual object detectors to operational target domains is a challenging task, commonly achieved using unsupervised domain adaptation (UDA) methods. When the labeled dataset is coming from multiple source domains, treating them as separate domains and performing a multi-source domain adaptation (MSDA) improves the accuracy and robustness over mixing these source domains and performing a UDA, as observed by recent studies in MSDA. Existing MSDA methods learn domain invariant and domain-specific parameters (for each source domain) for the adaptation. However, unlike single-source UDA methods, learning domain-specific parameters makes them grow significantly proportional to the number of source domains used. This paper proposes a novel MSDA method called Prototype-based Mean-Teacher (PMT), which uses class prototypes instead of domain-specific subnets to preserve domain-specific information. These prototypes are learned using a contrastive loss, aligning the same categories across domains and separating different categories far apart. Because of the use of prototypes, the parameter size of our method does not increase significantly with the number of source domains, thus reducing memory issues and possible overfitting. Empirical studies show PMT outperforms state-of-the-art MSDA methods on several challenging object detection datasets.
翻译:将视觉目标检测器适配到实际目标域是一项具有挑战性的任务,通常通过无监督域自适应方法实现。当标注数据集来自多个源域时,近期多源域自适应研究指出:将这些源域视为独立域进行多源域自适应,比混合源域后进行单源域自适应能提升准确率与鲁棒性。现有MSDA方法通过学习域不变参数和(各源域的)域特定参数实现自适应。然而,与单源域自适应方法不同,域特定参数的学习会导致模型参数规模随源域数量显著增长。本文提出一种新型MSDA方法——基于原型均值教师,该方法使用类别原型替代域特定子网络来保存域特定信息。这些原型通过对比损失学习,使跨域的同类特征对齐、异类特征分离。由于引入原型机制,本方法的参数量不随源域数量显著增加,从而缓解了内存占用与过拟合风险。在多个具有挑战性的目标检测数据集上,实验证明PMT方法性能优于当前最先进的MSDA方法。