In this paper, we extend the Riesz representation framework to causal inference under sample selection, where both treatment assignment and outcome observability are non-random. Formulating the problem in terms of a Riesz representer enables stable estimation and a transparent decomposition of omitted variable bias into three interpretable components: a data-identified scale factor, outcome confounding strength, and selection confounding strength. For estimation, we employ the ForestRiesz estimator, which accounts for selective outcome observability while avoiding the instability associated with direct propensity score inversion. We assess finite-sample performance through a simulation study and show that conventional double machine learning approaches can be highly sensitive to tuning parameters due to their reliance on inverse probability weighting, whereas the ForestRiesz estimator delivers more stable performance by leveraging automatic debiased machine learning. In an empirical application to the gender wage gap in the U.S., we find that our ForestRiesz approach yields larger treatment effect estimates than a standard double machine learning approach, suggesting that ignoring sample selection leads to an underestimation of the gender wage gap. Sensitivity analysis indicates that implausibly strong unobserved confounding would be required to overturn our results. Overall, our approach provides a unified, robust, and computationally attractive framework for causal inference under sample selection.
翻译:本文扩展了Riesz表示框架,将其应用于样本选择下的因果推断,其中处理分配和结果可观测性均非随机。通过Riesz表示子对问题进行形式化,能够实现稳定估计,并将遗漏变量偏误透明地分解为三个可解释的组成部分:数据可识别的尺度因子、结果混杂强度和选择混杂强度。在估计方面,我们采用ForestRiesz估计器,该估计器在考虑选择性结果可观测性的同时,避免了直接倾向得分求逆相关的不稳定性。我们通过模拟研究评估了有限样本性能,结果表明传统的双重机器学习方法由于依赖逆概率加权,对调优参数高度敏感,而ForestRiesz估计器通过利用自动去偏机器学习,提供了更稳定的性能。在美国性别工资差距的实证应用中,我们发现ForestRiesz方法比标准双重机器学习方法产生更大的处理效应估计值,这表明忽略样本选择会导致对性别工资差距的低估。敏感性分析表明,需要难以置信的强未观测混杂才能推翻我们的结果。总体而言,我们的方法为样本选择下的因果推断提供了一个统一、稳健且计算上具有吸引力的框架。