Unsupervised domain adaptation is critical to many real-world applications where label information is unavailable in the target domain. In general, without further assumptions, the joint distribution of the features and the label is not identifiable in the target domain. To address this issue, we rely on the property of minimal changes of causal mechanisms across domains to minimize unnecessary influences of distribution shifts. To encode this property, we first formulate the data-generating process using a latent variable model with two partitioned latent subspaces: invariant components whose distributions stay the same across domains and sparse changing components that vary across domains. We further constrain the domain shift to have a restrictive influence on the changing components. Under mild conditions, we show that the latent variables are partially identifiable, from which it follows that the joint distribution of data and labels in the target domain is also identifiable. Given the theoretical insights, we propose a practical domain adaptation framework called iMSDA. Extensive experimental results reveal that iMSDA outperforms state-of-the-art domain adaptation algorithms on benchmark datasets, demonstrating the effectiveness of our framework.
翻译:无监督域适应对许多目标域缺乏标签信息的实际应用至关重要。通常,在没有额外假设的情况下,目标域中特征与标签的联合分布是不可辨识的。为解决此问题,我们利用跨域因果机制最小变化这一特性,以最小化分布偏移带来的不必要影响。为编码该特性,我们首先使用一个具有两个划分潜在子空间的潜变量模型来公式化数据生成过程:跨域分布保持不变的不变分量,以及跨域变化的稀疏变化分量。我们进一步约束域偏移对变化分量产生有限影响。在温和条件下,我们证明潜变量是部分可辨识的,由此可知目标域中数据与标签的联合分布也是可辨识的。基于理论洞见,我们提出了一个实用的域适应框架iMSDA。大量实验结果表明,iMSDA在基准数据集上超越了现有最先进的域适应算法,验证了我们框架的有效性。