Selection bias is pervasive in observational studies. For example, large scale biobanks data can exhibit ``healthy volunteer bias'' when respondents are healthier and of higher socio-economic status than the population they are meant to represent. Recovering causal effects from such sub-population is an important problem in causal inference, as estimating average treatment effects (ATE) from selected populations can result in a severely biased estimate of the ATE from the whole population. In this paper, we investigate the identifiability of the ATE under selection bias. We provide necessary and sufficient conditions for ATE identifiability, leveraging weak assumptions on probability classes to characterize propensity score and selection probability. Compared to previous works, our results extend existing graphical identifiability criteria and offer a more comprehensive understanding of causal effect identification with strictly weaker conditions in the presence of selection bias.
翻译:选择偏差在观察性研究中普遍存在。例如,大规模生物样本库数据可能显示“健康志愿者偏差”,即受访者比其所代表的人群更健康且社会经济地位更高。从这类子群体中恢复因果效应是因果推断中的重要问题,因为基于选定人群估计的平均处理效应(ATE)可能导致对整体人群ATE的严重有偏估计。本文研究了选择偏差下ATE的可辨识性。我们利用概率类别的弱假设来刻画倾向得分和选择概率,给出了ATE可辨识的充分必要条件。与以往研究相比,我们的结果扩展了现有的图形化可辨识性准则,并在存在选择偏差的情况下,以严格更弱的条件提供了对因果效应辨识更全面的理解。