We study the problem of domain adaptation under distribution shift, where the shift is due to a change in the distribution of an unobserved, latent variable that confounds both the covariates and the labels. In this setting, neither the covariate shift nor the label shift assumptions apply. Our approach to adaptation employs proximal causal learning, a technique for estimating causal effects in settings where proxies of unobserved confounders are available. We demonstrate that proxy variables allow for adaptation to distribution shift without explicitly recovering or modeling latent variables. We consider two settings, (i) Concept Bottleneck: an additional ''concept'' variable is observed that mediates the relationship between the covariates and labels; (ii) Multi-domain: training data from multiple source domains is available, where each source domain exhibits a different distribution over the latent confounder. We develop a two-stage kernel estimation approach to adapt to complex distribution shifts in both settings. In our experiments, we show that our approach outperforms other methods, notably those which explicitly recover the latent confounder.
翻译:我们研究分布偏移下的领域自适应问题,该偏移源于一个同时影响协变量和标签的未观测潜变量的分布变化。在此设定下,协变量偏移和标签偏移假设均不成立。我们采用近端因果学习的方法进行自适应——该技术利用可观测的未观测混杂因子代理变量来估计因果效应。我们证明,代理变量能够在无需显式恢复或建模潜变量的情况下实现分布偏移的自适应。考虑两种设定:(i) 概念瓶颈:存在一个中介协变量与标签关系的额外“概念”变量;(ii) 多领域:可利用来自多个源领域的训练数据,且每个源领域在潜混杂因子上呈现不同分布。我们开发了一种两阶段核估计方法,以在这两种设定中自适应复杂的分布偏移。实验表明,我们的方法优于其他方法,特别是那些显式恢复潜混杂因子的方法。