Traditional cross-domain tasks, including domain adaptation and domain generalization, rely heavily on training model by source domain data. With the recent advance of vision-language models (VLMs), viewed as natural source models, the cross-domain task changes to directly adapt the pre-trained source model to arbitrary target domains equipped with prior domain knowledge, and we name this task Adaptive Domain Generalization (ADG). However, current cross-domain datasets have many limitations, such as unrealistic domains, unclear domain definitions, and the inability to fine-grained domain decomposition, which drives us to establish a novel dataset DomainVerse for ADG. Benefiting from the introduced hierarchical definition of domain shifts, DomainVerse consists of about 0.5 million images from 390 fine-grained realistic domains. With the help of the constructed DomainVerse and VLMs, we propose two methods called Domain CLIP and Domain++ CLIP for tuning-free adaptive domain generalization. Extensive and comprehensive experiments demonstrate the significance of the dataset and the effectiveness of the proposed methods.
翻译:传统跨域任务,包括域适应和域泛化,严重依赖源域数据训练模型。随着视觉语言模型(VLM)的近期发展,这些模型被视为自然源模型,跨域任务转变为直接将预训练源模型适配到具备先验域知识的任意目标域,我们将此任务命名为自适应域泛化(ADG)。然而,当前的跨域数据集存在诸多局限,如域不真实、域定义不清晰以及无法实现细粒度域分解,这促使我们为ADG建立一个新的数据集DomainVerse。得益于引入的域偏移层次化定义,DomainVerse包含来自390个细粒度真实域的约50万张图像。借助构建的DomainVerse和VLM,我们提出了两种方法——Domain CLIP和Domain++ CLIP——用于无调适自适应域泛化。广泛而全面的实验证明了数据集的重要性及所提方法的有效性。