Developments in genome-wide association studies and the increasing availability of summary genetic association data have made the application of two-sample Mendelian Randomization (MR) with summary data increasingly popular. Conventional two-sample MR methods often employ the same sample for selecting relevant genetic variants and for constructing final causal estimates. Such a practice often leads to biased causal effect estimates due to the well known "winner's curse" phenomenon. To address this fundamental challenge, we first examine its consequence on causal effect estimation both theoretically and empirically. We then propose a novel framework that systematically breaks the winner's curse, leading to unbiased association effect estimates for the selected genetic variants. Building upon the proposed framework, we introduce a novel rerandomized inverse variance weighted estimator that is consistent when selection and parameter estimation are conducted on the same sample. Under appropriate conditions, we show that the proposed RIVW estimator for the causal effect converges to a normal distribution asymptotically and its variance can be well estimated. We illustrate the finite-sample performance of our approach through Monte Carlo experiments and two empirical examples.
翻译:全基因组关联研究的发展以及汇总遗传关联数据可用性的增加,使得基于汇总数据的双样本孟德尔随机化(MR)应用日益普及。传统的双样本MR方法通常使用同一组样本进行相关遗传变异的选择和最终因果估计的构建。由于众所周知的“赢家诅咒”现象,这种做法往往导致因果效应估计产生偏倚。为了应对这一根本性挑战,我们首先从理论和经验两方面考察了赢家诅咒对因果效应估计的影响。随后,我们提出了一种系统性打破赢家诅咒的新框架,从而为所选遗传变异提供无偏的关联效应估计。基于该框架,我们引入了一种新颖的重随机化逆方差加权估计量,当选择和参数估计在同一组样本上进行时,该估计量具有一致性。在适当条件下,我们证明了所提出的RIVW估计量对因果效应的估计渐近收敛于正态分布,且其方差可被良好估计。我们通过蒙特卡洛实验和两个实证案例说明了该方法在有限样本下的表现。