Using administrative patient-care data such as Electronic Health Records and medical/pharmaceutical claims for population-based scientific research has become increasingly common. With vast sample sizes leading to very small standard errors, researchers need to pay more attention to potential biases in the estimates of association parameters of interest, specifically to biases that do not diminish with increasing sample size. Of these multiple sources of biases, in this paper, we focus on understanding selection bias. We present an analytic framework using directed acyclic graphs for guiding applied researchers to dissect how different sources of selection bias may affect their parameter estimates of interest. We review four easy-to-implement weighting approaches to reduce selection bias and explain through a simulation study when they can rescue us in practice with analysis of real world data. We provide annotated R codes to implement these methods.
翻译:使用诸如电子健康记录和医疗/药品理赔等行政性患者护理数据进行基于人群的科学研究已日益普遍。由于样本量庞大导致标准误差极小,研究者需更加关注感兴趣关联参数估计中潜在偏倚的影响,尤其是那些不会随样本量增大而减小的偏倚。在众多偏倚来源中,本文聚焦于理解选择偏倚。我们提出一个基于有向无环图的分析框架,指导应用型研究者剖析不同来源的选择偏倚如何影响其感兴趣的参数估计。我们回顾了四种易于实施的加权方法来减少选择偏倚,并通过模拟研究阐释这些方法在实际真实世界数据分析中何时能发挥作用。我们提供了实现这些方法的带注释的R代码。