Dynamic treatment regimes (DTRs) formalize medical decision-making as a sequence of rules for different stages, mapping patient-level information to recommended treatments. In practice, estimating an optimal DTR using observational data from electronic medical record (EMR) databases can be complicated by covariates that are missing not at random (MNAR) due to informative monitoring of patients. Since complete case analysis can result in consistent estimation of outcome model parameters under the assumption of outcome-independent missingness, Q-learning is a natural approach to accommodating MNAR covariates. However, the backward induction algorithm used in Q-learning can introduce challenges, as MNAR covariates at later stages can result in MNAR pseudo-outcomes at earlier stages, leading to suboptimal DTRs, even if the longitudinal outcome variables are fully observed. To address this unique missing data problem in DTR settings, we propose two weighted Q-learning approaches where inverse probability weights for missingness of the pseudo-outcomes are obtained through estimating equations with valid nonresponse instrumental variables or sensitivity analysis. Asymptotic properties of the weighted Q-learning estimators are derived and the finite-sample performance of the proposed methods is evaluated and compared with alternative methods through extensive simulation studies. Using EMR data from the Medical Information Mart for Intensive Care database, we apply the proposed methods to investigate the optimal fluid strategy for sepsis patients in intensive care units.
翻译:动态治疗方案(DTRs)将医疗决策形式化为不同阶段的规则序列,根据患者层面信息映射至推荐治疗。实践中,利用电子病历(EMR)数据库的观测数据估计最优DTR时,因患者信息性监测导致的协变量非随机缺失(MNAR)问题会变得复杂。由于完全案例分析能在结果独立于缺失的假设下得到结果模型参数的一致估计,Q学习是处理MNAR协变量的天然方法。然而,Q学习中的反向归纳算法会引入挑战:后续阶段的MNAR协变量可能导致前期阶段的MNAR伪结果,即使纵向结果变量完全观测,仍会得到次优DTR。为解决DTR场景中这一独特的缺失数据问题,我们提出两种加权Q学习方法,其中通过包含有效无应答工具变量或敏感性分析的估计方程获取伪结果缺失的逆概率权重。推导了加权Q学习估计量的渐近性质,并通过广泛模拟研究评估和比较了所提方法的有限样本表现。利用重症监护医疗信息集市(MIMIC)数据库中的EMR数据,我们将所提方法应用于重症监护病房脓毒症患者的最优液体策略研究。