This paper seeks to solve Multi-Source Domain Adaptation (MSDA), which aims to mitigate data distribution shifts when transferring knowledge from multiple labeled source domains to an unlabeled target domain. We propose a novel MSDA framework based on dictionary learning and optimal transport. We interpret each domain in MSDA as an empirical distribution. As such, we express each domain as a Wasserstein barycenter of dictionary atoms, which are empirical distributions. We propose a novel algorithm, DaDiL, for learning via mini-batches: (i) atom distributions; (ii) a matrix of barycentric coordinates. Based on our dictionary, we propose two novel methods for MSDA: DaDil-R, based on the reconstruction of labeled samples in the target domain, and DaDiL-E, based on the ensembling of classifiers learned on atom distributions. We evaluate our methods in 3 benchmarks: Caltech-Office, Office 31, and CRWU, where we improved previous state-of-the-art by 3.15%, 2.29%, and 7.71% in classification performance. Finally, we show that interpolations in the Wasserstein hull of learned atoms provide data that can generalize to the target domain.
翻译:本文旨在解决多源域适应问题,该问题旨在将知识从多个带标签的源域迁移至无标签的目标域时,缓解数据分布偏移。我们提出了一种基于字典学习与最优传输的新型MSDA框架。我们将MSDA中的每个域解释为经验分布,进而将每个域表示为字典原子(即经验分布)的Wasserstein重心。我们提出一种新算法DaDiL,通过小批量学习:(i)原子分布;(ii)重心坐标矩阵。基于所构建的字典,我们提出两种新的MSDA方法:基于目标域中带标签样本重构的DaDiL-R,以及基于原子分布上学习到的分类器集成的DaDiL-E。我们在三个基准数据集(Caltech-Office、Office 31、CRWU)上评估了方法,其在分类性能上分别较先前最优方法提升了3.15%、2.29%与7.71%。最后,我们展示了学习所得原子在Wasserstein凸包中的插值能够提供可泛化至目标域的数据。