Recently, despite the unprecedented success of large pre-trained visual-language models (VLMs) on a wide range of downstream tasks, the real-world unsupervised domain adaptation (UDA) problem is still not well explored. Therefore, in this paper, we first experimentally demonstrate that the unsupervised-trained VLMs can significantly reduce the distribution discrepancy between source and target domains, thereby improving the performance of UDA. However, a major challenge for directly deploying such models on downstream UDA tasks is prompt engineering, which requires aligning the domain knowledge of source and target domains, since the performance of UDA is severely influenced by a good domain-invariant representation. We further propose a Prompt-based Distribution Alignment (PDA) method to incorporate the domain knowledge into prompt learning. Specifically, PDA employs a two-branch prompt-tuning paradigm, namely base branch and alignment branch. The base branch focuses on integrating class-related representation into prompts, ensuring discrimination among different classes. To further minimize domain discrepancy, for the alignment branch, we construct feature banks for both the source and target domains and propose image-guided feature tuning (IFT) to make the input attend to feature banks, which effectively integrates self-enhanced and cross-domain features into the model. In this way, these two branches can be mutually promoted to enhance the adaptation of VLMs for UDA. We conduct extensive experiments on three benchmarks to demonstrate that our proposed PDA achieves state-of-the-art performance. The code is available at https://github.com/BaiShuanghao/Prompt-based-Distribution-Alignment.
翻译:近期,尽管大型预训练视觉语言模型(VLM)在下游任务中取得了前所未有的成功,但其在现实世界无监督域适应(UDA)问题上的应用仍未得到充分探索。为此,本文首先通过实验验证:无监督训练的VLM能显著缩小源域与目标域之间的分布差异,从而提升UDA性能。然而,直接将这些模型部署到下游UDA任务的主要挑战在于提示工程——需要对齐源域与目标域的领域知识,因为UDA的性能严重依赖于良好的域不变表示。我们进一步提出了一种基于提示的分布对齐(PDA)方法,将领域知识融入提示学习。具体而言,PDA采用双分支提示调优范式,即基础分支和对齐分支。基础分支专注于将类别相关表示整合到提示中,确保不同类别间的可区分性。为最小化域差异,在对齐分支中,我们为源域和目标域构建特征银行,并提出图像引导特征调优(IFT)机制,使输入关注特征银行,从而将自增强特征和跨域特征有效融入模型。通过这种方式,这两个分支可相互促进,增强VLM对UDA的适应能力。我们在三个基准数据集上开展大量实验,验证了所提出的PDA方法达到了最先进的性能。代码已开源至 https://github.com/BaiShuanghao/Prompt-based-Distribution-Alignment。