Single Domain Generalization (SDG) tackles the problem of training a model on a single source domain so that it generalizes to any unseen target domain. While this has been well studied for image classification, the literature on SDG object detection remains almost non-existent. To address the challenges of simultaneously learning robust object localization and representation, we propose to leverage a pre-trained vision-language model to introduce semantic domain concepts via textual prompts. We achieve this via a semantic augmentation strategy acting on the features extracted by the detector backbone, as well as a text-based classification loss. Our experiments evidence the benefits of our approach, outperforming by 10% the only existing SDG object detection method, Single-DGOD [49], on their own diverse weather-driving benchmark.
翻译:单一领域泛化(SDG)解决的是在单个源域上训练模型,使其能够泛化到任何未见目标域的问题。虽然这一问题在图像分类领域已有深入研究,但关于SDG目标检测的文献几乎不存在。为了解决同时学习鲁棒目标定位与表示的挑战,我们提出利用预训练的视觉-语言模型,通过文本提示引入语义领域概念。我们通过作用于检测器主干网络提取特征的语义增强策略以及基于文本的分类损失来实现这一目标。实验证明了我们方法的优势,在已有的唯一SDG目标检测方法Single-DGOD [49]所提出的多样化天气驾驶基准上,我们的方法性能提升了10%。