This paper introduces a prognostic method called FLASH that addresses the problem of joint modelling of longitudinal data and censored durations when a large number of both longitudinal and time-independent features are available. In the literature, standard joint models are either of the shared random effect or joint latent class type. Combining ideas from both worlds and using appropriate regularisation techniques, we define a new model with the ability to automatically identify significant prognostic longitudinal features in a high-dimensional context, which is of increasing importance in many areas such as personalised medicine or churn prediction. We develop an estimation methodology based on the EM algorithm and provide an efficient implementation. The statistical performance of the method is demonstrated both in extensive Monte Carlo simulation studies and on publicly available real-world datasets. Our method significantly outperforms the state-of-the-art joint models in predicting the latent class membership probability in terms of the C-index in a so-called ``real-time'' prediction setting, with a computational speed that is orders of magnitude faster than competing methods. In addition, our model automatically identifies significant features that are relevant from a practical perspective, making it interpretable.
翻译:本文提出了一种名为FLASH的预后方法,旨在解决当存在大量纵向特征和时间无关特征时,对纵向数据与删失持续时间的联合建模问题。现有文献中的标准联合模型主要分为共享随机效应模型和联合潜在类别模型两类。通过融合两类模型的核心理念,并结合适当的正则化技术,我们定义了一种新模型,该模型能够在高维场景下自动识别具有显著预后意义的纵向特征——这在个性化医疗或客户流失预测等众多领域日益重要。我们开发了基于期望最大化(EM)算法的估计方法,并提供了高效实现。通过大规模蒙特卡洛模拟研究及公开真实数据集的验证,证明了该方法具有优异的统计性能。在所谓的“实时”预测场景中,我们的方法在预测潜在类别成员概率的C指数指标上显著优于现有最先进的联合模型,计算速度比竞争方法快数个数量级。此外,该模型能自动识别具有实际意义的显著特征,具备良好的可解释性。