This paper seeks to solve Multi-Source Domain Adaptation (MSDA), which aims to mitigate data distribution shifts when transferring knowledge from multiple labeled source domains to an unlabeled target domain. We propose a novel MSDA framework based on dictionary learning and optimal transport. We interpret each domain in MSDA as an empirical distribution. As such, we express each domain as a Wasserstein barycenter of dictionary atoms, which are empirical distributions. We propose a novel algorithm, DaDiL, for learning via mini-batches: (i) atom distributions; (ii) a matrix of barycentric coordinates. Based on our dictionary, we propose two novel methods for MSDA: DaDil-R, based on the reconstruction of labeled samples in the target domain, and DaDiL-E, based on the ensembling of classifiers learned on atom distributions. We evaluate our methods in 3 benchmarks: Caltech-Office, Office 31, and CRWU, where we improved previous state-of-the-art by 3.15%, 2.29%, and 7.71% in classification performance. Finally, we show that interpolations in the Wasserstein hull of learned atoms provide data that can generalize to the target domain.
翻译:本文旨在解决多源域自适应(MSDA)问题,该问题旨在将知识从多个带标签源域迁移至无标签目标域时,缓解数据分布差异。我们提出了一种基于字典学习和最优传输的新型MSDA框架。将MSDA中的每个域解释为经验分布,进而将每个域表示为字典原子(即经验分布)的Wasserstein重心。我们提出了一种新颖算法DaDiL,用于通过小批量学习:(i)原子分布;(ii)重心坐标矩阵。基于所构建的字典,我们提出了两种MSDA新方法:DaDil-R(基于目标域中带标签样本的重建)与DaDiL-E(基于原子分布上学习分类器的集成)。我们在三个基准数据集(Caltech-Office、Office 31、CRWU)上评估了方法,分类性能较先前最优方法分别提升了3.15%、2.29%和7.71%。最后,我们证明在所学原子构成的Wasserstein凸包中进行插值,可生成能泛化至目标域的数据。