Unsupervised domain adaptation (UDA) aims to transfer a model learned using labeled data from the source domain to unlabeled data in the target domain. To address the large domain gap issue between the source and target domains, we propose a novel regularization method for domain adaptive object detection, BlenDA, by generating the pseudo samples of the intermediate domains and their corresponding soft domain labels for adaptation training. The intermediate samples are generated by dynamically blending the source images with their corresponding translated images using an off-the-shelf pre-trained text-to-image diffusion model which takes the text label of the target domain as input and has demonstrated superior image-to-image translation quality. Based on experimental results from two adaptation benchmarks, our proposed approach can significantly enhance the performance of the state-of-the-art domain adaptive object detector, Adversarial Query Transformer (AQT). Particularly, in the Cityscapes to Foggy Cityscapes adaptation, we achieve an impressive 53.4% mAP on the Foggy Cityscapes dataset, surpassing the previous state-of-the-art by 1.5%. It is worth noting that our proposed method is also applicable to various paradigms of domain adaptive object detection. The code is available at:https://github.com/aiiu-lab/BlenDA
翻译:无监督领域自适应(UDA)旨在将使用源域带标签数据训练的模型迁移至目标域的无标签数据。为解决源域与目标域之间的大领域差异问题,我们提出一种用于领域自适应目标检测的新型正则化方法——BlenDA,该方法通过生成中间域的伪样本及其对应的软领域标签进行自适应训练。中间样本通过使用现成的预训练文本到图像扩散模型(该模型以目标域的文本标签为输入,并展现出卓越的图像到图像翻译质量)将源图像与其对应的翻译图像进行动态混合生成。基于两个自适应基准的实验结果表明,我们所提出的方法能够显著提升最先进的领域自适应目标检测器——对抗查询变换器(AQT)的性能。特别是在Cityscapes到Foggy Cityscapes的自适应任务中,我们在Foggy Cityscapes数据集上达到了53.4%的mAP,较先前最先进水平提升了1.5%。值得注意的是,所提出的方法同样适用于多种范式的领域自适应目标检测。代码开源地址为:https://github.com/aiiu-lab/BlenDA