Most current domain adaptation methods address either covariate shift or label shift, but are not applicable where they occur simultaneously and are confounded with each other. Domain adaptation approaches which do account for such confounding are designed to adapt covariates to optimally predict a particular label whose shift is confounded with covariate shift. In this paper, we instead seek to achieve general-purpose data backwards compatibility. This would allow the adapted covariates to be used for a variety of downstream problems, including on pre-existing prediction models and on data analytics tasks. To do this we consider a modification of generalized label shift (GLS), which we call confounded shift. We present a novel framework for this problem, based on minimizing the expected divergence between the source and target conditional distributions, conditioning on possible confounders. Within this framework, we provide concrete implementations using the Gaussian reverse Kullback-Leibler divergence and the maximum mean discrepancy. Finally, we demonstrate our approach on synthetic and real datasets.
翻译:当前大多数域适应方法主要解决协变量偏移或标签偏移问题,但当两者同时发生且相互混杂时,这些方法并不适用。现有考虑此类混杂的域适应方法旨在调整协变量以最优预测特定标签,而该标签的偏移与协变量偏移存在混杂关系。本文提出一种实现通用数据后向兼容的新方法,使得调整后的协变量能够适用于多种下游任务,包括基于现有预测模型的数据分析任务。为此,我们改进广义标签偏移(GLS)框架,提出混杂偏移的概念。基于最小化源域与目标域条件分布之间预期散度的思想,我们构建了以潜在混杂因子为条件的全新理论框架。在此框架下,我们分别采用高斯逆Kullback-Leibler散度与最大均值差异给出了具体实现方案。最后,通过合成数据集与真实数据集的实验验证了所提方法的有效性。