Addressing the domain adaptation problem becomes more challenging when distribution shifts across domains stem from latent confounders that affect both covariates and outcomes. Existing proxy-based approaches that address latent shift rely on a strong completeness assumption to uniquely determine (point-identify) a robust predictor. Completeness requires that proxies have sufficient information about variations in latent confounders. For imperfect proxies the mapping from confounders to the space of proxy distributions is non-injective, and multiple latent confounder values can generate the same proxy distribution. This breaks the completeness assumption and observed data are consistent with multiple potential predictors (set-identified). To address this, we introduce latent equivalent classes (LECs). LECs are defined as groups of latent confounders that induce the same conditional proxy distribution. We show that point-identification for the robust predictor remains achievable as long as multiple domains differ sufficiently in how they mix proxy-induced LECs to form the robust predictor. This domain diversity condition is formalized as a cross-domain rank condition on the mixture weights, which is substantially weaker assumption than completeness. We introduce the Proximal Quasi-Bayesian Active learning (PQAL) framework, which actively queries a small, targeted set of diverse domains that satisfy this rank condition. PQAL can recover the point-identified predictor, demonstrates robustness to varying degrees of shift and outperforms previous methods on synthetic data and semi-synthetic dSprites, IHDP, ACS Folktables datasets.
翻译:当跨域分布偏移源于同时影响协变量和结果的潜在混杂因素时,领域自适应问题变得更加具有挑战性。现有解决潜在偏移的代理变量方法依赖于强完备性假设来唯一确定(点识别)一个稳健预测器。完备性要求代理变量包含关于潜在混杂变量变化的充分信息。对于不完美代理变量,从混杂变量到代理变量分布空间的映射是非单射的,多个潜在混杂变量值可能产生相同的代理变量分布。这破坏了完备性假设,观测数据与多个潜在预测器(集合识别)一致。为解决此问题,我们引入潜在等价类(LECs)。LECs定义为能诱导相同条件代理变量分布的潜在混杂变量组。我们证明,只要多个领域在混合代理变量诱导的LECs以形成稳健预测器的方式上存在足够差异,稳健预测器的点识别仍然可以实现。这种领域多样性条件形式化为混合权重的跨域秩条件,该假设比完备性假设弱得多。我们提出近端准贝叶斯主动学习(PQAL)框架,该框架主动查询满足该秩条件的小规模、针对性不同的领域集。PQAL能够恢复点识别预测器,展示了对不同程度偏移的鲁棒性,并在合成数据和半合成dSprites、IHDP、ACS Folktables数据集上优于先前方法。