Mendelian randomization (MR) is a widely-used method to estimate the causal relationship between a risk factor and disease. A fundamental part of any MR analysis is to choose appropriate genetic variants as instrumental variables. Genome-wide association studies often reveal that hundreds of genetic variants may be robustly associated with a risk factor, but in some situations investigators may have greater confidence in the instrument validity of only a smaller subset of variants. Nevertheless, the use of additional instruments may be optimal from the perspective of mean squared error even if they are slightly invalid; a small bias in estimation may be a price worth paying for a larger reduction in variance. For this purpose, we consider a method for "focused" instrument selection whereby genetic variants are selected to minimise the estimated asymptotic mean squared error of causal effect estimates. In a setting of many weak and locally invalid instruments, we propose a novel strategy to construct confidence intervals for post-selection focused estimators that guards against the worst case loss in asymptotic coverage. In empirical applications to: (i) validate lipid drug targets; and (ii) investigate vitamin D effects on a wide range of outcomes, our findings suggest that the optimal selection of instruments does not involve only a small number of biologically-justified instruments, but also many potentially invalid instruments.
翻译:孟德尔随机化是一种广泛用于估计风险因素与疾病之间因果关系的方法。任何孟德尔随机化分析的关键环节在于选择合适的遗传变异作为工具变量。全基因组关联研究常揭示数百种遗传变异可能与风险因素显著关联,但在某些情况下,研究者可能对其中较小子集的工具有效性更有信心。然而,从均方误差的角度看,即使额外工具存在轻微无效性,使用更多工具也可能实现最优效果——以估计的微小偏差为代价换取更大的方差缩减。为此,我们提出一种"聚焦式"工具选择方法,通过选择遗传变异以最小化因果效应估计的渐近均方误差。针对存在大量弱工具和局部无效工具的场景,我们提出一种新型策略来构建聚焦选择后估计量的置信区间,该策略能防范渐近覆盖率的极端损失。在实证应用中:(i)验证脂质药物靶点;(ii)探究维生素D对多种结局的影响,我们的结果表明最优工具选择不仅包含少量经生物学验证的工具,还应纳入大量潜在无效工具。