The vision-language pre-training has enabled deep models to make a huge step forward in generalizing across unseen domains. The recent learning method based on the vision-language pre-training model is a great tool for domain generalization and can solve this problem to a large extent. However, there are still some issues that an advancement still suffers from trading-off between domain invariance and class separability, which are crucial in current DG problems. However, there are still some issues that an advancement still suffers from trading-off between domain invariance and class separability, which are crucial in current DG problems. In this paper, we introduce a novel prompt learning strategy that leverages deep vision prompts to address domain invariance while utilizing language prompts to ensure class separability, coupled with adaptive weighting mechanisms to balance domain invariance and class separability. Extensive experiments demonstrate that deep vision prompts effectively extract domain-invariant features, significantly improving the generalization ability of deep models and achieving state-of-the-art performance on three datasets.
翻译:视觉语言预训练使深度模型在跨未见域泛化方面取得了巨大进展。基于视觉语言预训练模型的近期学习方法成为解决域泛化问题的重要工具,并在很大程度上能够应对这一挑战。然而,当前域泛化问题中仍存在一个关键难题:如何在域不变性与类别可分性之间进行权衡。本文提出一种新型提示学习策略,利用深度视觉提示处理域不变性,同时借助语言提示确保类别可分性,并引入自适应权重机制以平衡域不变性与类别可分性。大量实验表明,深度视觉提示能有效提取域不变特征,显著提升深度模型的泛化能力,在三个数据集上均实现了最先进的性能。