Prior Unsupervised Domain Adaptation (UDA) methods often aim to train a domain-invariant feature extractor, which may hinder the model from learning sufficiently discriminative features. To tackle this, a line of works based on prompt learning leverages the power of large-scale pre-trained vision-language models to learn both domain-invariant and specific features through a set of domain-agnostic and domain-specific learnable prompts. Those studies typically enforce invariant constraints on representation, output, or prompt space to learn such prompts. In contrast, we cast UDA as a multiple-objective optimization problem in which each objective is represented by a domain loss. Under this new framework, we propose to align per-objective gradients to foster consensus between them. Additionally, to prevent potential overfitting when fine-tuning this deep learning architecture, we penalize the norm of these gradients. To achieve these goals, we devise a practical gradient update procedure that can work under both single-source and multi-source UDA. Empirically, our method consistently outperforms other vision-language model adaptation methods. The implementation is available at https://github.com/VietHoang1512/PGA.
翻译:先前的无监督领域自适应方法通常旨在训练一个领域不变的特征提取器,这可能阻碍模型学习足够具有判别性的特征。为解决此问题,一系列基于提示学习的研究利用大规模预训练视觉-语言模型的能力,通过一组领域无关和领域特定的可学习提示来同时学习领域不变和领域特定的特征。这些研究通常在表示空间、输出空间或提示空间上施加不变性约束以学习此类提示。与之相反,我们将无监督领域自适应视为一个多目标优化问题,其中每个目标由一个领域损失表示。在此新框架下,我们提出对齐每个目标的梯度以促进它们之间的共识。此外,为防止微调这一深度学习架构时可能出现的过拟合,我们对这些梯度的范数进行惩罚。为实现这些目标,我们设计了一种实用的梯度更新流程,可在单源和多源无监督领域自适应场景下工作。实验表明,我们的方法在性能上持续优于其他视觉-语言模型自适应方法。实现代码发布于 https://github.com/VietHoang1512/PGA。