Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects

Many causal estimands are only partially identifiable since they depend on the unobservable joint distribution between potential outcomes. Stratification on pretreatment covariates can yield sharper bounds; however, unless the covariates are discrete with relatively small support, this approach typically requires binning covariates or estimating the conditional distributions of the potential outcomes given the covariates. Binning can result in substantial efficiency loss and become challenging to implement, even with a moderate number of covariates. Estimating conditional distributions, on the other hand, may yield invalid inference if the distributions are inaccurately estimated, such as when a misspecified model is used or when the covariates are high-dimensional. In this paper, we propose a unified and model-agnostic inferential approach for a wide class of partially identified estimands. Our method, based on duality theory for optimal transport problems, has four key properties. First, in randomized experiments, our approach can wrap around any estimates of the conditional distributions and provide uniformly valid inference, even if the initial estimates are arbitrarily inaccurate. A simple extension of our method to observational studies is doubly robust in the usual sense. Second, if nuisance parameters are estimated at semiparametric rates, our estimator is asymptotically unbiased for the sharp partial identification bound. Third, we can apply the multiplier bootstrap to select covariates and models without sacrificing validity, even if the true model is not selected. Finally, our method is computationally efficient. Overall, in three empirical applications, our method consistently reduces the width of estimated identified sets and confidence intervals without making additional structural assumptions.

翻译：许多因果估计量仅具有部分可识别性，因为它们依赖于不可观测的潜在结果联合分布。基于预处理协变量的分层分析可以获得更尖锐的边界；然而，除非协变量具有相对较小支撑集的离散变量，否则该方法通常需要对协变量进行分箱处理或估计给定协变量条件下潜在结果的条件分布。分箱处理可能导致显著的效率损失，并且即使协变量数量适中，实施起来也颇具挑战性。另一方面，若条件分布估计不准确（例如使用错误设定的模型或面临高维协变量情形），则条件分布估计可能产生无效推断。本文针对一大类部分可识别估计量，提出了一种统一且模型无关的推断方法。基于最优传输问题的对偶理论，我们的方法具有四个关键特性：首先，在随机化实验中，即使初始估计存在任意程度的不准确性，我们的方法仍能围绕任意条件分布估计提供一致有效的推断。将本方法简单扩展至观察性研究时，其具有常规意义上的双重稳健性。其次，若 nuisance 参数以半参数速率收敛，则我们的估计量对尖锐部分识别边界具有渐近无偏性。第三，即使未选择真实模型，我们仍可应用乘子自助法进行协变量与模型选择而不牺牲有效性。最后，本方法具有计算高效性。总体而言，在三个实证应用中，我们的方法在无需额外结构假设的情况下，持续缩小了识别集与置信区间的估计宽度。