Unsupervised Domain Adaptation (UDA) methods facilitate knowledge transfer from a labeled source domain to an unlabeled target domain, navigating the obstacle of domain shift. While Convolutional Neural Networks (CNNs) are a staple in UDA, the rise of Vision Transformers (ViTs) provides new avenues for domain generalization. This paper presents an innovative method to bolster ViT performance in source-free target adaptation, beginning with an evaluation of how key, query, and value elements affect ViT outcomes. Experiments indicate that altering the key component has negligible effects on Transformer performance. Leveraging this discovery, we introduce Domain Representation Images (DRIs), feeding embeddings through the key element. DRIs act as domain-specific markers, effortlessly merging with the training regimen. To assess our method, we perform target adaptation tests on the Cross Instance DRI source-only (SO) control. We measure the efficacy of target adaptation with and without DRIs, against existing benchmarks like SHOT-B* and adaptations via CDTrans. Findings demonstrate that excluding DRIs offers limited gains over SHOT-B*, while their inclusion in the key segment boosts average precision promoting superior domain generalization. This research underscores the vital role of DRIs in enhancing ViT efficiency in UDA scenarios, setting a precedent for further domain adaptation explorations.
翻译:无监督域自适应方法通过从带标签的源域向无标签的目标域迁移知识,以克服域偏移障碍。虽然卷积神经网络是无监督域自适应的主流技术,但视觉Transformer的兴起为域泛化提供了新途径。本文提出一种创新方法,通过分析键、查询和值组件对视觉Transformer性能的影响,提升其在无源目标自适应中的表现。实验表明,修改键组件对Transformer性能影响甚微。基于此发现,我们引入域表征图像,将嵌入特征通过键组件进行传递。域表征图像作为域特异性标记,能无缝融入训练流程。为评估该方法,我们在跨实例域表征图像源域控制上执行目标自适应测试,分别度量使用与不使用域表征图像时目标自适应的效果,并与SHOT-B*及CDTrans自适应等现有基准进行对比。结果表明,排除域表征图像相比SHOT-B*提升有限,而将其纳入键组件可提升平均精度,促进更优的域泛化性能。本研究揭示了域表征图像在增强视觉Transformer无监督域自适应效率中的关键作用,为后续域自适应探索奠定基础。