Source-free active domain adaptation (SFADA) enhances knowledge transfer from a source model to an unlabeled target domain using limited manual labels selected via active learning. While recent domain adaptation studies have introduced Vision-and-Language (ViL) models to improve pseudo-label quality or feature alignment, they often treat ViL-based and data supervision as separate sources, lacking effective fusion. To overcome this limitation, we propose Dual Active learning with Multimodal (DAM) foundation model, a novel framework that integrates multimodal supervision from a ViL model to complement sparse human annotations, thereby forming a dual supervisory signal. DAM initializes stable ViL-guided targets and employs a bidirectional distillation mechanism to foster mutual knowledge exchange between the target model and the dual supervisions during iterative adaptation. Extensive experiments demonstrate that DAM consistently outperforms existing methods and sets a new state-of-the-art across multiple SFADA benchmarks and active learning strategies.
翻译:无源主动域适应(SFADA)通过主动学习选择有限的人工标注,将源模型的知识迁移到未标注的目标域。尽管近期域适应研究引入了视觉-语言(ViL)模型以提升伪标签质量或特征对齐,但这些方法通常将基于ViL的监督与人工监督视为独立来源,缺乏有效融合。为克服这一局限,我们提出基于多模态基础模型的双主动学习(DAM)框架,该创新方案通过整合ViL模型提供的多模态监督来补充稀疏的人工标注,从而构建双重监督信号。DAM首先初始化稳定的ViL引导目标,并采用双向蒸馏机制在迭代适应过程中促进目标模型与双重监督之间的知识交互。大量实验表明,DAM在多个SFADA基准测试和主动学习策略中持续超越现有方法,创造了新的最优性能记录。