Semi-Supervised Causal Inference: Generalizable and Double Robust Inference for Average Treatment Effects under Selection Bias with Decaying Overlap

Average treatment effect (ATE) estimation is an essential problem in the causal inference literature, which has received significant recent attention, especially with the presence of high-dimensional confounders. We consider the ATE estimation problem in high dimensions when the observed outcome (or label) itself is possibly missing. The labeling indicator's conditional propensity score is allowed to depend on the covariates, and also decay uniformly with sample size - thus allowing for the unlabeled data size to grow faster than the labeled data size. Such a setting fills in an important gap in both the semi-supervised (SS) and missing data literatures. We consider a missing at random (MAR) mechanism that allows selection bias - this is typically forbidden in the standard SS literature, and without a positivity condition - this is typically required in the missing data literature. We first propose a general doubly robust 'decaying' MAR (DR-DMAR) SS estimator for the ATE, which is constructed based on flexible (possibly non-parametric) nuisance estimators. The general DR-DMAR SS estimator is shown to be doubly robust, as well as asymptotically normal (and efficient) when all the nuisance models are correctly specified. Additionally, we propose a bias-reduced DR-DMAR SS estimator based on (parametric) targeted bias-reducing nuisance estimators along with a special asymmetric cross-fitting strategy. We demonstrate that the bias-reduced ATE estimator is asymptotically normal as long as either the outcome regression or the propensity score model is correctly specified. Moreover, the required sparsity conditions are weaker than all the existing doubly robust causal inference literature even under the regular supervised setting - this is a special degenerate case of our setting. Lastly, this work also contributes to the growing literature on generalizability in causal inference.

翻译：平均处理效应（ATE）估计是因果推断文献中的核心问题，近年来在高维混杂因素存在的情境下备受关注。本文考虑当观测结果（或标签）本身可能缺失时的高维ATE估计问题。标签指示变量的条件倾向得分允许依赖于协变量，并随样本量均匀衰减——从而允许未标记数据规模的增长速度快于标记数据。这一设定填补了半监督学习与缺失数据文献中的重要空白。我们采用允许选择偏差的随机缺失机制——这在标准半监督文献中通常被禁止，且不要求正性条件——这在缺失数据文献中通常被要求。首先，我们提出一种通用的双稳健“衰减”随机缺失半监督ATE估计量，该估计量基于灵活（可能非参数）的干扰参数估计量构建。理论证明，该通用估计量具有双稳健性，并且在所有干扰模型均正确设定时具有渐近正态性（及有效性）。此外，我们基于（参数化）目标偏差缩减干扰估计量及特殊的不对称交叉拟合策略，提出一种偏差缩减的衰减随机缺失半监督ATE估计量。证明该偏差缩减ATE估计量在结果回归或倾向得分模型任一正确设定时即具有渐近正态性。同时，所需稀疏性条件弱于现有双稳健因果推断文献——即使在我们框架退化为标准监督学习特例时亦成立。最后，本工作亦对因果推断中泛化性这一新兴领域作出贡献。