Foundation models (FMs) have revolutionized computer vision, enabling effective learning across different domains. However, their performance under domain shift is yet underexplored. This paper investigates the zero-shot domain adaptation potential of FMs by comparing different backbone architectures and introducing novel domain-aware components that leverage domain related textual embeddings. We propose domain adaptive normalization, termed as Domino, which explicitly leverages domain embeddings during fine-tuning, thus making the model domain aware. Ultimately, Domino enables more robust computer vision models that can adapt effectively to various unseen domains.
翻译:基础模型(FMs)彻底改变了计算机视觉领域,实现了跨不同领域的有效学习。然而,其在领域偏移下的性能尚未得到充分探索。本文通过比较不同骨干网络架构,并引入利用领域相关文本嵌入的新型领域感知组件,研究了基础模型的零样本领域适应潜力。我们提出了领域自适应归一化方法,命名为Domino,该方法在微调过程中显式利用领域嵌入,从而使模型具备领域感知能力。最终,Domino能够构建更鲁棒的计算机视觉模型,使其能够有效适应各种未见过的领域。