Selection bias is pervasive in observational studies. For example, large scale biobanks data can exhibit ``healthy volunteer bias'' when respondents are healthier and of higher socio-economic status than the population they are meant to represent. Recovering causal effects from such sub-population is an important problem in causal inference, as estimating average treatment effects (ATE) from selected populations can result in a severely biased estimate of the ATE from the whole population. In this paper, we investigate the identifiability of the ATE under selection bias. We provide necessary and sufficient conditions for ATE identifiability, leveraging weak assumptions on probability classes to characterize propensity score and selection probability. Compared to previous works, our results extend existing graphical identifiability criteria and offer a more comprehensive understanding of causal effect identification with strictly weaker conditions in the presence of selection bias.
翻译:选择偏差在观察性研究中普遍存在。例如,大规模生物库数据可能表现出“健康志愿者偏差”,即受访者比目标人群更健康且社会经济地位更高。从这类子群体中恢复因果效应是因果推断中的一个重要问题,因为从选定人群中估计平均处理效应(ATE)可能导致对整个群体ATE的严重有偏估计。在本文中,我们研究了选择偏差下ATE的可识别性。我们利用概率类别的弱假设来刻画倾向得分和选择概率,从而给出了ATE可识别性的充分必要条件。与以往工作相比,我们的结果扩展了现有的图可识别性准则,并在存在选择偏差的情况下,以严格更弱的条件提供了对因果效应识别更全面的理解。