Semi-supervised domain adaptation (SSDA) adapts a learner to a new domain by effectively utilizing source domain data and a few labeled target samples. It is a practical yet under-investigated research topic. In this paper, we analyze the SSDA problem from two perspectives that have previously been overlooked, and correspondingly decompose it into two \emph{key subproblems}: \emph{robust domain adaptation (DA) learning} and \emph{maximal cross-domain data utilization}. \textbf{(i)} From a causal theoretical view, a robust DA model should distinguish the invariant ``concept'' (key clue to image label) from the nuisance of confounding factors across domains. To achieve this goal, we propose to generate \emph{concept-invariant samples} to enable the model to classify the samples through causal intervention, yielding improved generalization guarantees; \textbf{(ii)} Based on the robust DA theory, we aim to exploit the maximal utilization of rich source domain data and a few labeled target samples to boost SSDA further. Consequently, we propose a collaboratively debiasing learning framework that utilizes two complementary semi-supervised learning (SSL) classifiers to mutually exchange their unbiased knowledge, which helps unleash the potential of source and target domain training data, thereby producing more convincing pseudo-labels. Such obtained labels facilitate cross-domain feature alignment and duly improve the invariant concept learning. In our experimental study, we show that the proposed model significantly outperforms SOTA methods in terms of effectiveness and generalisability on SSDA datasets.
翻译:半监督域适应(SSDA)通过有效利用源域数据和少量标注的目标域样本,将学习器适应至新领域。这是一个实用但尚未充分研究的研究课题。本文从两个先前被忽视的角度分析SSDA问题,并相应将其分解为两个关键子问题:鲁棒域适应(DA)学习与最大化跨域数据利用。(i)从因果理论视角,鲁棒域适应模型应区分不变“概念”(图像标签的关键线索)与跨域混淆因素的干扰。为实现这一目标,我们提出生成概念不变样本,使模型通过因果干预对样本进行分类,从而获得更强的泛化保证;(ii)基于鲁棒域适应理论,我们致力于最大化利用丰富源域数据和少量标注目标样本,以进一步提升SSDA性能。为此,我们提出协作去偏学习框架,利用两个互补的半监督学习(SSL)分类器互相交换无偏知识,有助于释放源域与目标域训练数据的潜力,从而生成更可靠的伪标签。所得标签促进跨域特征对齐,并适时改进不变概念学习。实验研究表明,所提模型在SSDA数据集上的有效性和泛化性显著优于现有最优方法。