This paper introduces a prognostic method called FLASH that addresses the problem of joint modelling of longitudinal data and censored durations when a large number of both longitudinal and time-independent features are available. In the literature, standard joint models are either of the shared random effect or joint latent class type. Combining ideas from both worlds and using appropriate regularisation techniques, we define a new model with the ability to automatically identify significant prognostic longitudinal features in a high-dimensional context, which is of increasing importance in many areas such as personalised medicine or churn prediction. We develop an estimation methodology based on the EM algorithm and provide an efficient implementation. The statistical performance of the method is demonstrated both in extensive Monte Carlo simulation studies and on publicly available real-world datasets. Our method significantly outperforms the state-of-the-art joint models in predicting the latent class membership probability in terms of the C-index in a so-called ``real-time'' prediction setting, with a computational speed that is orders of magnitude faster than competing methods. In addition, our model automatically identifies significant features that are relevant from a practical perspective, making it interpretable.
翻译:本文提出了一种名为FLASH的预后分析方法,用于解决在同时存在大量纵向特征与时不变特征时,纵向数据与删失生存时间的联合建模问题。现有文献中的标准联合模型主要分为共享随机效应型与联合潜类别型两类。本研究融合了两类模型的思路,并采用适当的正则化技术,构建了一种能够在高维背景下自动识别显著预后纵向特征的新模型,这在个性化医疗或客户流失预测等诸多领域中日益重要。我们开发了基于EM算法的参数估计方法,并提供了高效的计算实现。通过大量的蒙特卡洛模拟研究和公开的真实世界数据集验证了该方法的统计性能。在所谓的"实时"预测场景下,本方法在预测潜类别归属概率的C指数方面显著优于现有联合模型,且计算速度比竞争方法快数个数量级。此外,该模型能自动识别具有实际意义的显著特征,从而保证了模型的可解释性。