Large-scale foundation models like CLIP have shown strong zero-shot generalization but struggle with domain shifts, limiting their adaptability. In our work, we introduce \textsc{StyLIP}, a novel domain-agnostic prompt learning strategy for Domain Generalization (DG). StyLIP disentangles visual style and content in CLIP`s vision encoder by using style projectors to learn domain-specific prompt tokens and combining them with content features. Trained contrastively, this approach enables seamless adaptation across domains, outperforming state-of-the-art methods on multiple DG benchmarks. Additionally, we propose AD-CLIP for unsupervised domain adaptation (DA), leveraging CLIP`s frozen vision backbone to learn domain-invariant prompts through image style and content features. By aligning domains in embedding space with entropy minimization, AD-CLIP effectively handles domain shifts, even when only target domain samples are available. Lastly, we outline future work on class discovery using prompt learning for semantic segmentation in remote sensing, focusing on identifying novel or rare classes in unstructured environments. This paves the way for more adaptive and generalizable models in complex, real-world scenarios.
翻译:大规模基础模型(如CLIP)展现出强大的零样本泛化能力,但在面对领域偏移时仍存在适应性问题。本研究提出一种新颖的领域无关提示学习策略\textsc{StyLIP},用于领域泛化任务。该方法通过风格投影器在CLIP视觉编码器中解耦视觉风格与内容特征,学习领域特定的提示令牌并与内容特征相结合。经过对比训练,该策略实现了跨领域的无缝适应,在多个领域泛化基准测试中超越了现有最优方法。此外,我们提出AD-CLIP模型用于无监督领域自适应任务,利用CLIP冻结的视觉骨干网络,通过图像风格与内容特征学习领域不变的提示。借助嵌入空间中的熵最小化领域对齐机制,AD-CLIP能有效处理领域偏移问题,即使在仅有目标领域样本的情况下也能保持性能。最后,我们展望了基于提示学习的遥感语义分割类别发现研究,重点探索非结构化环境中新颖或稀有类别的识别方法。这些工作为构建适用于复杂现实场景的自适应可泛化模型开辟了新路径。