Domain adaptive object detection (DAOD) aims to generalize detectors trained on an annotated source domain to an unlabelled target domain. However, existing methods focus on reducing the domain bias of the detection backbone by inferring a discriminative visual encoder, while ignoring the domain bias in the detection head. Inspired by the high generalization of vision-language models (VLMs), applying a VLM as the robust detection backbone following a domain-aware detection head is a reasonable way to learn the discriminative detector for each domain, rather than reducing the domain bias in traditional methods. To achieve the above issue, we thus propose a novel DAOD framework named Domain-Aware detection head with Prompt tuning (DA-Pro), which applies the learnable domain-adaptive prompt to generate the dynamic detection head for each domain. Formally, the domain-adaptive prompt consists of the domain-invariant tokens, domain-specific tokens, and the domain-related textual description along with the class label. Furthermore, two constraints between the source and target domains are applied to ensure that the domain-adaptive prompt can capture the domains-shared and domain-specific knowledge. A prompt ensemble strategy is also proposed to reduce the effect of prompt disturbance. Comprehensive experiments over multiple cross-domain adaptation tasks demonstrate that using the domain-adaptive prompt can produce an effectively domain-related detection head for boosting domain-adaptive object detection.
翻译:摘要:域自适应目标检测(DAOD)旨在将从带标注源域训练的检测器泛化到无标注目标域。然而,现有方法侧重于通过推断判别性视觉编码器来减少检测骨干的域偏差,却忽略了检测头中的域偏差。受视觉语言模型(VLM)高泛化能力的启发,采用VLM作为稳健的检测骨干,并配备域感知检测头,是学习每个域判别性检测器的合理方式,而非像传统方法那样减少域偏差。为解决上述问题,我们提出了一种新颖的DAOD框架,命名为“基于提示调优的域感知检测头”(DA-Pro),该框架利用可学习的域自适应提示为每个域生成动态检测头。形式上,域自适应提示由域不变令牌、域特定令牌、域相关文本描述及类别标签组成。此外,在源域和目标域之间施加两个约束,以确保域自适应提示能够捕捉域共享和域特定的知识。还提出了一种提示集成策略,以减少提示扰动的效应。在多个跨域自适应任务上的综合实验表明,使用域自适应提示可以生成有效的域相关检测头,从而促进域自适应目标检测的性能提升。