Heterogeneous Domain Adaptation with Positive and Unlabeled Data

Heterogeneous unsupervised domain adaptation (HUDA) is the most challenging domain adaptation setting where the feature spaces of source and target domains are heterogeneous, and the target domain has only unlabeled data. Existing HUDA methods assume that both positive and negative examples are available in the source domain, which may not be satisfied in some real applications. This paper addresses a new challenging setting called positive and unlabeled heterogeneous unsupervised domain adaptation (PU-HUDA), a HUDA setting where the source domain only has positives. PU-HUDA can also be viewed as an extension of PU learning where the positive and unlabeled examples are sampled from different domains. A naive combination of existing HUDA and PU learning methods is ineffective in PU-HUDA due to the gap in label distribution between the source and target domains. To overcome this issue, we propose a novel method, predictive adversarial domain adaptation (PADA), which can predict likely positive examples from the unlabeled target data and simultaneously align the feature spaces to reduce the distribution divergence between the whole source data and the likely positive target data. PADA achieves this by a unified adversarial training framework for learning a classifier to predict positive examples and a feature transformer to transform the target feature space to that of the source. Specifically, they are both trained to fool a common discriminator that determines whether the likely positive examples are from the target or source domain. We experimentally show that PADA outperforms several baseline methods, such as the naive combination of HUDA and PU learning.

翻译：异构无监督域自适应（HUDA）是最具挑战性的域自适应场景，其中源域与目标域的特征空间异构，且目标域仅包含未标记数据。现有HUDA方法均假设源域同时包含正例和负例样本，但这一假设在某些实际应用中无法满足。本文提出一种名为"正例与未标记异构无监督域自适应"（PU-HUDA）的新挑战场景，即源域仅含正例的HUDA设置。PU-HUDA亦可视为PU学习的扩展，其中正例与未标记样本来自不同域。由于源域与目标域标签分布存在差异，现有HUDA与PU学习方法的简单组合在PU-HUDA场景中效果不佳。为解决该问题，我们提出一种新颖方法——预测对抗域自适应（PADA），该方法能够从无标记目标数据中预测潜在正例，同时对齐特征空间以减小整个源数据与潜在正目标数据之间的分布差异。PADA通过统一的对抗训练框架实现：训练分类器以预测正例，并训练特征转换器将目标特征空间映射至源特征空间。具体而言，二者共同欺骗一个判别器，该判别器需判断潜在正例来源于目标域还是源域。实验表明，PADA在性能上优于多种基线方法，例如HUDA与PU学习的简单组合。