Longitudinal electronic health record (EHR) data offer opportunities to study biomarker trajectories; however, association estimates-the primary inferential target-from standard models designed for regular observation times may be biased by a two-stage hierarchical missingness mechanism. The first stage is the visiting process (informative presence), where encounters occur at irregular times driven by patient health status; the second is the observation process (informative observation), where biomarkers are selectively measured during visits. To address these mechanisms, we propose a unified semiparametric joint modeling framework that simultaneously characterizes the visiting, biomarker observation, and longitudinal outcome processes. Central to this framework is a shared subject-specific Gaussian latent variable that captures unmeasured frailty and induces dependence across all components. We develop a three-stage estimation procedure and establish the consistency and asymptotic normality of our estimators. We also introduce a sequential procedure that imputes missing biomarkers prior to adjusting for irregular visiting and examine its performance. Simulation results demonstrate that our method yields unbiased estimates under this mechanism, whereas existing approaches can be substantially biased; notably, methods adjusting only for irregular visiting may exhibit even greater bias than those ignoring both mechanisms. We apply our framework to data from the All of Us Research Program to investigate associations between neighborhood-level socioeconomic status indicators and six blood-based biomarker trajectories, providing a robust tool for outpatient settings where irregular monitoring and selective measurement are prevalent.
翻译:纵向电子健康记录(EHR)数据为研究生物标志物轨迹提供了机会;然而,针对规律观测时间设计的标准模型所得出的关联估计(主要推断目标)可能受到两阶段层次缺失机制的偏倚影响。第一阶段是就诊过程(信息性存在),即由患者健康状况驱动的非规律时间就诊;第二阶段是观测过程(信息性观测),即在就诊期间选择性测量生物标志物。为应对这些机制,我们提出了一个统一的半参数联合建模框架,该框架同时刻画就诊过程、生物标志物观测过程和纵向结局过程。该框架的核心是一个共享的个体特异性高斯潜变量,用于捕捉未测量的衰弱性并诱导所有组件间的依赖性。我们开发了一个三阶段估计程序,并证明了估计量的一致性与渐近正态性。我们还引入了一种序贯程序,在调整非规律就诊前对缺失生物标志物进行插补,并检验其性能。模拟结果表明,在此机制下我们的方法能产生无偏估计,而现有方法可能存在显著偏倚;值得注意的是,仅调整非规律就诊的方法可能比忽略两种机制的方法表现出更大的偏倚。我们将该框架应用于"全民研究计划"的数据,以探究社区层面社会经济地位指标与六种血液生物标志物轨迹之间的关联,为门诊环境中普遍存在的非规律监测和选择性测量问题提供了稳健的分析工具。