Multi-Source Domain Adaptation (MSDA) aims to mitigate changes in data distribution when transferring knowledge from multiple labeled source domains to an unlabeled target domain. However, existing MSDA techniques assume target domain images are available, yet overlook image-rich semantic information. Consequently, an open question is whether MSDA can be guided solely by textual cues in the absence of target domain images. By employing a multimodal model with a joint image and language embedding space, we propose a novel language-guided MSDA approach, termed LanDA, based on optimal transfer theory, which facilitates the transfer of multiple source domains to a new target domain, requiring only a textual description of the target domain without needing even a single target domain image, while retaining task-relevant information. We present extensive experiments across different transfer scenarios using a suite of relevant benchmarks, demonstrating that LanDA outperforms standard fine-tuning and ensemble approaches in both target and source domains.
翻译:多源域自适应(MSDA)旨在缓解将知识从多个有标签的源域迁移至无标签的目标域时数据分布的变化。然而,现有MSDA方法假设目标域图像可用,却忽略了图像中丰富的语义信息。因此,一个开放性问题在于:能否在无目标域图像的情况下,仅通过文本线索引导MSDA?通过采用具有联合图像与语言嵌入空间的多模态模型,我们提出一种基于最优传输理论的新型语言引导MSDA方法——LanDA。该方法仅需目标域的文本描述(无需任何目标域图像),即可实现从多个源域到新目标域的迁移,同时保留任务相关信息。我们在多种迁移场景下,利用一系列相关基准数据集进行了广泛实验,结果表明,LanDA在目标域和源域上的性能均优于标准微调与集成方法。