Unsupervised domain adaptation (UDA) tries to overcome the need for a large labeled dataset by transferring knowledge from a source dataset, with lots of labeled data, to a target dataset, that has no labeled data. Since there are no labels in the target domain, early misalignment might propagate into the later stages and lead to an error build-up. In order to overcome this problem, we propose a gradual source domain expansion (GSDE) algorithm. GSDE trains the UDA task several times from scratch, each time reinitializing the network weights, but each time expands the source dataset with target data. In particular, the highest-scoring target data of the previous run are employed as pseudo-source samples with their respective pseudo-label. Using this strategy, the pseudo-source samples induce knowledge extracted from the previous run directly from the start of the new training. This helps align the two domains better, especially in the early training epochs. In this study, we first introduce a strong baseline network and apply our GSDE strategy to it. We conduct experiments and ablation studies on three benchmarks (Office-31, OfficeHome, and DomainNet) and outperform state-of-the-art methods. We further show that the proposed GSDE strategy can improve the accuracy of a variety of different state-of-the-art UDA approaches.
翻译:无监督域适应(UDA)旨在通过从带有大量标注数据的源数据集向无标注的目标数据集迁移知识,来克服对大规模标注数据集的需求。由于目标域中缺乏标签,早期的错误对齐可能会传播到后续阶段,导致误差积累。为解决这一问题,我们提出了一种逐步源域扩展(GSDE)算法。GSDE从零开始多次训练UDA任务,每次重新初始化网络权重,但逐步用目标数据扩展源数据集。具体而言,将前一轮训练中得分最高的目标数据作为伪源样本,并赋予其对应的伪标签。采用这一策略后,伪源样本能够在新一轮训练开始时直接引入前一轮提取的知识,从而更好地对齐两个域,尤其是在早期训练阶段。本研究中,我们首先引入了一个强基线网络,并将GSDE策略应用于该网络。我们在三个基准数据集(Office-31、Office-Home和DomainNet)上进行了实验和消融研究,性能超越了现有最先进方法。我们进一步证明,所提出的GSDE策略能够提升多种不同最先进UDA方法的准确率。