Retrospective cohorts can be extracted from Electronic Health Records (EHR) to study prevalence, time until disease or event occurrence and cure proportion in real world scenarios. However, EHR are collected for patient care rather than research, so typically have complexities, such as patients with missing baseline disease status. Prevalence-Incidence (PI) models, which use a two-component mixture model to account for this missing data, have been proposed. However, PI models are biased in settings in which some individuals will never experience the endpoint (they are 'cured'). To address this, we propose a Prevalence Incidence Cure (PIC) model, a 3 component mixture model that combines the PI model framework with a cure model. Our PIC model enables estimation of the prevalence, time-to-incidence, and the cure proportion, and allows for covariates to affect these. We adopt a Bayesian inference approach, and focus on the interpretability of the prior. We show in a simulation study that the PIC model has smaller bias than a PI model for the survival probability; and compare inference under vague, informative and misspecified priors. We illustrate our model using a dataset of 1964 patients undergoing treatment for Diabetic Macular Oedema, demonstrating improved fit under the PIC model.
翻译:回顾性队列可从电子健康记录中提取,用于研究真实世界场景中的患病率、疾病或事件发生时间以及治愈比例。然而,电子健康记录是为患者护理而非研究目的收集的,因此通常存在复杂性,例如患者基线疾病状态缺失。已有研究提出患病率-发病率模型,该模型使用双组分混合模型来处理此类缺失数据。然而,在部分个体永远不会经历终点事件(即他们被'治愈')的情况下,患病率-发病率模型存在偏差。为解决此问题,我们提出了一种患病率-发病率治愈模型,这是一种将患病率-发病率模型框架与治愈模型相结合的三组分混合模型。我们的患病率-发病率治愈模型能够估计患病率、发病时间以及治愈比例,并允许协变量影响这些参数。我们采用贝叶斯推断方法,并重点关注先验分布的可解释性。我们在模拟研究中表明,对于生存概率,患病率-发病率治愈模型比患病率-发病率模型具有更小的偏差;并比较了在模糊、信息性和错误设定先验下的推断结果。我们使用一个包含1964名接受糖尿病性黄斑水肿治疗的患者的数据库来说明我们的模型,证明了患病率-发病率治愈模型具有更好的拟合效果。