Single-domain generalized object detection aims to enhance a model's generalizability to multiple unseen target domains using only data from a single source domain during training. This is a practical yet challenging task as it requires the model to address domain shift without incorporating target domain data into training. In this paper, we propose a novel phrase grounding-based style transfer (PGST) approach for the task. Specifically, we first define textual prompts to describe potential objects for each unseen target domain. Then, we leverage the grounded language-image pre-training (GLIP) model to learn the style of these target domains and achieve style transfer from the source to the target domain. The style-transferred source visual features are semantically rich and could be close to imaginary counterparts in the target domain. Finally, we employ these style-transferred visual features to fine-tune GLIP. By introducing imaginary counterparts, the detector could be effectively generalized to unseen target domains using only a single source domain for training. Extensive experimental results on five diverse weather driving benchmarks demonstrate our proposed approach achieves state-of-the-art performance, even surpassing some domain adaptive methods that incorporate target domain images into the training process.The source codes and pre-trained models will be made available.
翻译:单域广义目标检测旨在仅使用单个源域数据进行训练时,提升模型对多个未见目标域的泛化能力。这是一个实用但具有挑战性的任务,因为它要求模型在不引入目标域数据训练的情况下处理域偏移。本文针对该任务提出了一种基于短语定位的风格迁移(PGST)方法。具体而言,我们首先为每个未见目标域定义文本提示以描述潜在目标。然后,利用基础语言-图像预训练(GLIP)模型学习这些目标域的风格,并实现从源域到目标域的风格迁移。经过风格迁移的源域视觉特征具有丰富的语义信息,可逼近目标域中假想的对应特征。最后,我们采用这些风格迁移后的视觉特征对GLIP进行微调。通过引入假想对应特征,检测器可仅凭单个源域训练数据有效泛化至未见目标域。在五个不同天气驾驶基准上的大量实验结果表明,我们提出的方法达到了最先进性能,甚至超越了部分在训练过程中引入目标域图像的域自适应方法。源代码与预训练模型将公开提供。