Covariate shift arises when covariate distributions differ between source and target populations while the conditional distribution of the response remains invariant, and it underlies problems in missing data and causal inference. We propose a minimum Wasserstein distance estimation framework for inference under covariate shift that avoids explicit modeling of outcome regressions or importance weights. The resulting W-estimator admits a closed-form expression and is numerically equivalent to the classical 1-nearest neighbor estimator, yielding a new optimal transport interpretation of nearest neighbor methods. We establish root-$n$ asymptotic normality and show that the estimator is not asymptotically linear, leading to super-efficiency relative to the semiparametric efficient estimator under covariate shift in certain regimes, and uniformly in missing data problems. Numerical simulations, along with an analysis of a rainfall dataset, underscore the exceptional performance of our W-estimator.
翻译:协变量偏移是指源群体与目标群体的协变量分布存在差异,而响应的条件分布保持不变的现象,它是缺失数据和因果推断问题的基础。我们提出了一种基于最小Wasserstein距离的估计框架,用于在协变量偏移下进行推断,该框架避免了对结果回归或重要性权重的显式建模。由此得到的W估计器具有闭式表达式,且在数值上等价于经典的1-最近邻估计器,从而为最近邻方法提供了一种新的最优传输解释。我们建立了根号n渐近正态性,并证明该估计器不是渐近线性的,这导致在某些机制下,相对于协变量偏移下的半参数有效估计器,它具有超效率性,且在缺失数据问题中具有一致超效率性。数值模拟以及对降雨数据集的分析,突显了我们提出的W估计器的卓越性能。