Closing the Confusion Loop: CLIP-Guided Alignment for Source-Free Domain Adaptation

Source-Free Domain Adaptation (SFDA) tackles the problem of adapting a pre-trained source model to an unlabeled target domain without accessing any source data, which is quite suitable for the field of data security. Although recent advances have shown that pseudo-labeling strategies can be effective, they often fail in fine-grained scenarios due to subtle inter-class similarities. A critical but underexplored issue is the presence of asymmetric and dynamic class confusion, where visually similar classes are unequally and inconsistently misclassified by the source model. Existing methods typically ignore such confusion patterns, leading to noisy pseudo-labels and poor target discrimination. To address this, we propose CLIP-Guided Alignment(CGA), a novel framework that explicitly models and mitigates class confusion in SFDA. Generally, our method consists of three parts: (1) MCA: detects first directional confusion pairs by analyzing the predictions of the source model in the target domain; (2) MCC: leverages CLIP to construct confusion-aware textual prompts (e.g. a truck that looks like a bus), enabling more context-sensitive pseudo-labeling; and (3) FAM: builds confusion-guided feature banks for both CLIP and the source model and aligns them using contrastive learning to reduce ambiguity in the representation space. Extensive experiments on various datasets demonstrate that CGA consistently outperforms state-of-the-art SFDA methods, with especially notable gains in confusion-prone and fine-grained scenarios. Our results highlight the importance of explicitly modeling inter-class confusion for effective source-free adaptation. Our code can be find at https://github.com/soloiro/CGA

翻译：无源域自适应（SFDA）旨在解决在无法访问任何源数据的情况下，将预训练的源模型适配到未标注目标域的问题，这一方法非常适用于数据安全领域。尽管近期研究表明伪标签策略可能有效，但由于细微的类间相似性，它们在细粒度场景中往往表现不佳。一个关键但尚未充分探索的问题是不对称且动态的类别混淆现象，即源模型对视觉上相似的类别存在不均衡且不一致的误分类。现有方法通常忽略此类混淆模式，导致伪标签噪声大且目标判别能力差。为解决这一问题，我们提出了CLIP引导对齐（CGA），这是一个显式建模并缓解SFDA中类别混淆的新框架。总体而言，我们的方法包含三个部分：（1）MCA：通过分析源模型在目标域中的预测结果，检测首轮定向混淆对；（2）MCC：利用CLIP构建混淆感知的文本提示（例如“一辆看起来像公交车的卡车”），实现更具上下文敏感性的伪标签生成；（3）FAM：为CLIP和源模型分别建立混淆引导的特征库，并通过对比学习进行对齐，以降低表征空间中的歧义性。在多个数据集上的大量实验表明，CGA始终优于当前最先进的SFDA方法，在易混淆和细粒度场景中提升尤为显著。我们的结果凸显了显式建模类间混淆对于实现有效无源自适应的重要性。代码可在 https://github.com/soloiro/CGA 获取。