We develop a functional proportional hazards mixture cure (FPHMC) model with scalar and functional covariates measured at the baseline. The mixture cure model, useful in studying populations with a cure fraction of a particular event of interest is extended to functional data. We employ the EM algorithm and develop a semiparametric penalized spline-based approach to estimate the dynamic functional coefficients of the incidence and the latency part. The proposed method is computationally efficient and simultaneously incorporates smoothness in the estimated functional coefficients via roughness penalty. Simulation studies illustrate a satisfactory performance of the proposed method in accurately estimating the model parameters and the baseline survival function. Finally, the clinical potential of the model is demonstrated in two real data examples that incorporate rich high-dimensional biomedical signals as functional covariates measured at the baseline and constitute novel domains to apply cure survival models in contemporary medical situations. In particular, we analyze i) minute-by-minute physical activity data from the National Health and Nutrition Examination Survey (NHANES) 2003-2006 to study the association between diurnal patterns of physical activity (PA) at baseline and all cancer mortality through 2019 while adjusting for other biological factors; ii) the impact of daily functional measures of disease severity collected in the intensive care unit on post ICU recovery and mortality event. Our findings provide novel epidemiological insights into the association between daily patterns of PA and cancer mortality. Software implementation and illustration of the proposed estimation method is provided in R.
翻译:本文提出一种函数比例风险混合治愈(FPHMC)模型,该模型包含基线测量的标量协变量和函数协变量。混合治愈模型常用于研究存在特定事件治愈比例的人群,本文将其扩展到函数数据分析领域。我们采用EM算法,并发展了一种基于半参数惩罚样条的估计方法,以动态估计发病部分和潜伏部分的函数系数。所提方法计算高效,并通过粗糙度惩罚同时实现了估计函数系数的光滑性。模拟研究表明,该方法在准确估计模型参数和基线生存函数方面表现满意。最后,通过在两个真实数据实例中展示模型的临床潜力,这些实例将丰富的高维生物医学信号作为基线测量的函数协变量,构成了在现代医疗情境中应用治愈生存模型的新领域。具体而言,我们分析了:i) 来自2003-2006年国家健康与营养调查(NHANES)的逐分钟体力活动数据,以研究基线时体力活动(PA)的昼夜模式与截至2019年全因癌症死亡率之间的关联,同时调整其他生物学因素;ii) 重症监护病房收集的每日疾病严重程度函数测量对ICU后恢复及死亡事件的影响。我们的研究结果为PA昼夜模式与癌症死亡率之间的关联提供了新的流行病学见解。本文在R语言中提供了所提估计方法的软件实现与示例说明。