Although increasingly used for research, electronic health records (EHR) often lack gold-standard assessment of key data elements. Linking EHRs to other data sources with higher-quality measurements can improve statistical inference, but such analyses must account for selection bias if the linked data source arises from a non-probability sample. We propose a set of novel estimators targeting the average treatment effect (ATE) that combine information from binary outcomes measured with error in a large, population-representative EHR database with gold-standard outcomes obtained from a smaller validation sample subject to selection bias. We evaluate our approach in extensive simulations and an analysis of data from the Adult Changes in Thought (ACT) study, a longitudinal study of incident dementia in a cohort of Kaiser Permanente Washington members with linked EHR data. For a subset of deceased ACT participants who consented to brain autopsy prior to death, gold-standard measures of Alzheimer's disease neuropathology are available. Our proposed estimators reduced bias and improved efficiency for the ATE, facilitating valid inference with EHR data when key data elements are ascertained with error.
翻译:尽管电子健康记录(EHR)在研究中日益普及,但其关键数据要素常缺乏金标准评估。将EHR与具有更高质量测量的其他数据源关联可改进统计推断,但当关联数据源来自非概率样本时,此类分析必须考虑选择偏倚。我们提出了一系列针对平均处理效应(ATE)的新型估计量,这些估计量将来自大型人群代表性EHR数据库中存在测量误差的二元结局信息,与受选择偏倚影响的较小验证样本中获得的金标准结局信息相结合。我们通过大量模拟分析及对"成人认知变化"(ACT)研究数据的分析来评估该方法。ACT研究是一项针对华盛顿州凯撒医疗集团成员队列的痴呆发病率纵向研究,并关联了EHR数据。对于部分在生前同意进行脑部尸检的已故ACT参与者,可获得阿尔茨海默病神经病理学的金标准测量。我们提出的估计量减少了ATE的偏倚并提高了估计效率,为关键数据要素存在测量误差时利用EHR数据进行有效推断提供了支持。