Recently, despite the unprecedented success of large pre-trained visual-language models (VLMs) on a wide range of downstream tasks, the real-world unsupervised domain adaptation (UDA) problem is still not well explored. Therefore, in this paper, we first experimentally demonstrate that the unsupervised-trained VLMs can significantly reduce the distribution discrepancy between source and target domains, thereby improving the performance of UDA. However, a major challenge for directly deploying such models on downstream UDA tasks is prompt engineering, which requires aligning the domain knowledge of source and target domains, since the performance of UDA is severely influenced by a good domain-invariant representation. We further propose a Prompt-based Distribution Alignment (PDA) method to incorporate the domain knowledge into prompt learning. Specifically, PDA employs a two-branch prompt-tuning paradigm, namely base branch and alignment branch. The base branch focuses on integrating class-related representation into prompts, ensuring discrimination among different classes. To further minimize domain discrepancy, for the alignment branch, we construct feature banks for both the source and target domains and propose image-guided feature tuning (IFT) to make the input attend to feature banks, which effectively integrates self-enhanced and cross-domain features into the model. In this way, these two branches can be mutually promoted to enhance the adaptation of VLMs for UDA. We conduct extensive experiments on three benchmarks to demonstrate that our proposed PDA achieves state-of-the-art performance. The code is available at https://github.com/BaiShuanghao/Prompt-based-Distribution-Alignment.
翻译:近期,尽管大规模预训练视觉-语言模型(VLM)在各类下游任务中取得了前所未有的成功,但现实场景中的无监督域适应(UDA)问题仍未得到充分探索。为此,本文首先通过实验证明,无监督训练的VLM能够显著缩小源域与目标域之间的分布差异,从而提升UDA性能。然而,直接将这些模型部署到下游UDA任务时面临的主要挑战是提示工程,这需要对齐源域与目标域的领域知识,因为良好的域不变表示对UDA性能具有重要影响。我们进一步提出了一种基于提示的分布对齐(PDA)方法,将领域知识融入提示学习过程。具体而言,PDA采用双分支提示微调范式,即基分支和对齐分支。基分支专注于将类别相关表示整合到提示中,确保不同类别间的判别性。为进一步减小域差异,在对齐分支中,我们为源域和目标域构建特征库,并提出图像引导特征微调(IFT)方法,使输入关注特征库,从而有效将自增强特征与跨域特征融入模型。通过这种方式,两个分支可以相互促进,增强VLM对UDA的适应性。我们在三个基准数据集上进行了大量实验,证明所提出的PDA方法达到了当前最优性能。代码已开源:https://github.com/BaiShuanghao/Prompt-based-Distribution-Alignment。