Active domain adaptation (ADA) aims to improve the model adaptation performance by incorporating active learning (AL) techniques to label a maximally-informative subset of target samples. Conventional AL methods do not consider the existence of domain shift, and hence, fail to identify the truly valuable samples in the context of domain adaptation. To accommodate active learning and domain adaption, the two naturally different tasks, in a collaborative framework, we advocate that a customized learning strategy for the target data is the key to the success of ADA solutions. We present Divide-and-Adapt (DiaNA), a new ADA framework that partitions the target instances into four categories with stratified transferable properties. With a novel data subdivision protocol based on uncertainty and domainness, DiaNA can accurately recognize the most gainful samples. While sending the informative instances for annotation, DiaNA employs tailored learning strategies for the remaining categories. Furthermore, we propose an informativeness score that unifies the data partitioning criteria. This enables the use of a Gaussian mixture model (GMM) to automatically sample unlabeled data into the proposed four categories. Thanks to the "divideand-adapt" spirit, DiaNA can handle data with large variations of domain gap. In addition, we show that DiaNA can generalize to different domain adaptation settings, such as unsupervised domain adaptation (UDA), semi-supervised domain adaptation (SSDA), source-free domain adaptation (SFDA), etc.
翻译:主动领域自适应(ADA)旨在通过引入主动学习(AL)技术标记最富信息量的目标样本子集,以提升模型自适应性能。传统主动学习方法未考虑领域偏移的存在,因此在领域自适应背景下无法识别真正有价值的样本。为将主动学习与领域自适应这两个本质不同的任务协同整合,我们认为对目标数据采用定制化学习策略是ADA方案成功的关键。我们提出"分而治之"(DiaNA)框架——一种新型ADA框架,该框架将目标实例按可迁移性分层划分为四类。通过基于不确定性与领域特性的新型数据细分协议,DiaNA可精准识别最具增益价值的样本。在将信息性实例送交标注的同时,DiaNA对剩余类别采用定制化学习策略。此外,我们提出一种统一数据划分准则的信息量评分机制,使高斯混合模型(GMM)能够自动将未标注数据采样至所提出的四个类别。得益于"分而治之"思想,DiaNA可处理具有大规模领域差异的数据。更进一步,我们证明DiaNA能够泛化至不同领域自适应场景,包括无监督领域自适应(UDA)、半监督领域自适应(SSDA)、无源领域自适应(SFDA)等。