Wide heterogeneity exists in cancer patients' survival, ranging from a few months to several decades. To accurately predict clinical outcomes, it is vital to build an accurate predictive model that relates patients' molecular profiles with patients' survival. With complex relationships between survival and high-dimensional molecular predictors, it is challenging to conduct non-parametric modeling and irrelevant predictors removing simultaneously. In this paper, we build a kernel Cox proportional hazards semi-parametric model and propose a novel regularized garrotized kernel machine (RegGKM) method to fit the model. We use the kernel machine method to describe the complex relationship between survival and predictors, while automatically removing irrelevant parametric and non-parametric predictors through a LASSO penalty. An efficient high-dimensional algorithm is developed for the proposed method. Comparison with other competing methods in simulation shows that the proposed method always has better predictive accuracy. We apply this method to analyze a multiple myeloma dataset and predict patients' death burden based on their gene expressions. Our results can help classify patients into groups with different death risks, facilitating treatment for better clinical outcomes.
翻译:癌症患者的生存时间存在广泛异质性,从数月到数十年不等。为准确预测临床结局,建立连接患者分子特征与生存时间的精确预测模型至关重要。由于生存时间与高维分子预测因子之间存在复杂关系,同时进行非参数建模与无关预测因子剔除颇具挑战性。本文构建了核Cox比例风险半参数模型,并提出了一种新型正则化绞合核机器(RegGKM)方法进行模型拟合。我们采用核机方法描述生存时间与预测因子间的复杂关系,同时通过LASSO惩罚自动剔除无关参数及非参数预测因子。针对所提方法,我们开发了高效的高维算法。模拟实验表明,与其他竞争方法相比,该方法始终具有更优的预测精度。我们将该方法应用于多发性骨髓瘤数据集分析,基于患者基因表达预测其死亡风险。研究结果有助于将患者分为不同死亡风险组,为改善临床结局的治疗决策提供依据。