Selection bias is a common concern in epidemiologic studies. In the literature, selection bias is often viewed as a missing data problem. Popular approaches to adjust for bias due to missing data, such as inverse probability weighting, rely on the assumption that data are missing at random and can yield biased results if this assumption is violated. In observational studies with outcome data missing not at random, Heckman's sample selection model can be used to adjust for bias due to missing data. In this paper, we review Heckman's method and a similar approach proposed by Tchetgen Tchetgen and Wirth (2017). We then discuss how to apply these methods to Mendelian randomization analyses using individual-level data, with missing data for either the exposure or outcome or both. We explore whether genetic variants associated with participation can be used as instruments for selection. We then describe how to obtain missingness-adjusted Wald ratio, two-stage least squares and inverse variance weighted estimates. The two methods are evaluated and compared in simulations, with results suggesting that they can both mitigate selection bias but may yield parameter estimates with large standard errors in some settings. In an illustrative real-data application, we investigate the effects of body mass index on smoking using data from the Avon Longitudinal Study of Parents and Children.
翻译:选择偏倚是流行病学研究中的常见问题。文献中,选择偏倚通常被视为缺失数据问题。常用的缺失数据偏倚校正方法(如逆概率加权)依赖于数据随机缺失的假设,若此假设不成立则可能导致结果偏倚。对于结局数据非随机缺失的观察性研究,可采用Heckman样本选择模型进行缺失数据偏倚校正。本文回顾了Heckman方法及Tchetgen Tchetgen与Wirth(2017)提出的类似方法,进而探讨如何将这些方法应用于基于个体水平数据的孟德尔随机化分析中(暴露、结局或二者均存在缺失数据的情形)。我们检验了与参与相关的遗传变异能否作为选择工具变量,并详细描述了如何获取经缺失数据校正的Wald比率、两阶段最小二乘估计及逆方差加权估计。通过模拟研究对两种方法进行评估与比较,结果表明二者均能减轻选择偏倚,但在某些场景下可能产生标准误较大的参数估计。在真实数据示例中,我们利用埃文亲子纵向研究数据分析了体重指数对吸烟行为的影响。