Mendelian randomization (MR) is a widely-used method to estimate the causal relationship between a risk factor and disease. A fundamental part of any MR analysis is to choose appropriate genetic variants as instrumental variables. Genome-wide association studies often reveal that hundreds of genetic variants may be robustly associated with a risk factor, but in some situations investigators may believe that only a smaller subset of these variants are valid instruments. Nevertheless, using the full set of instruments could lead to biased but more precise estimates, and therefore in terms of mean squared error it may be unclear which set of instruments is optimal. For this purpose, we consider a method for "focused" instrument selection whereby genetic variants are selected to minimise the estimated asymptotic mean squared error of causal effect estimates. In a setting of many weak and locally invalid instruments, we consider a novel strategy to construct confidence intervals for post-selection focused estimators which guards against the worst case loss in asymptotic coverage. In empirical applications to: (i) validate lipid drug targets; and (ii) investigate vitamin D effects on a wide range of outcomes, our findings suggest that the optimal selection of instruments does not involve only a small number of biologically-justified valid instruments, but also many potentially invalid instruments.
翻译:孟德尔随机化(MR)是一种广泛用于评估风险因素与疾病之间因果关系的方法。任何MR分析的核心环节在于选择合适的遗传变异作为工具变量。全基因组关联研究常揭示数百个遗传变异与某风险因素稳健相关,但在某些情况下,研究者可能认为仅其中一小部分变异是有效工具。然而,使用全部工具变量虽可能引入偏倚却可提高估计精度,因此从均方误差角度而言,尚难确定何种工具变量集合最优。为此,我们提出一种“聚焦式”工具变量选择方法:通过选择最小化因果效应估计渐近均方误差的遗传变异实现优化。针对大量弱工具变量与局部无效工具变量的情境,我们设计了一种创新策略,用于构建后选择聚焦估计量的置信区间,以防范渐近覆盖概率的最坏情况损失。在两项实证应用中——(i)验证脂质药物靶点;(ii)探究维生素D对多种结局的影响——我们的发现表明:工具变量的最优选择不仅包含少量基于生物学合理性确定的有效工具,还应纳入大量潜在无效工具。