Fast and flexible inference for joint models of multivariate longitudinal and survival data using Integrated Nested Laplace Approximations

Modeling longitudinal and survival data jointly offers many advantages such as addressing measurement error and missing data in the longitudinal processes, understanding and quantifying the association between the longitudinal markers and the survival events and predicting the risk of events based on the longitudinal markers. A joint model involves multiple submodels (one for each longitudinal/survival outcome) usually linked together through correlated or shared random effects. Their estimation is computationally expensive (particularly due to a multidimensional integration of the likelihood over the random effects distribution) so that inference methods become rapidly intractable, and restricts applications of joint models to a small number of longitudinal markers and/or random effects. We introduce a Bayesian approximation based on the Integrated Nested Laplace Approximation algorithm implemented in the R package R-INLA to alleviate the computational burden and allow the estimation of multivariate joint models with fewer restrictions. Our simulation studies show that R-INLA substantially reduces the computation time and the variability of the parameter estimates compared to alternative estimation strategies. We further apply the methodology to analyze 5 longitudinal markers (3 continuous, 1 count, 1 binary, and 16 random effects) and competing risks of death and transplantation in a clinical trial on primary biliary cholangitis. R-INLA provides a fast and reliable inference technique for applying joint models to the complex multivariate data encountered in health research.

翻译：联合建模纵向数据和生存数据具有诸多优势，例如处理纵向过程中的测量误差和缺失数据、理解并量化纵向标记物与生存事件之间的关联，以及基于纵向标记物预测事件风险。联合模型包含多个子模型（每个纵向/生存结果对应一个子模型），通常通过相关或共享随机效应相互连接。其估计过程计算成本高昂（尤其是需要对似然函数在随机效应分布上进行多维积分），导致推断方法迅速变得难以处理，并限制了联合模型在少量纵向标记物和/或随机效应中的应用。我们提出了一种基于积分嵌套拉普拉斯近似算法的贝叶斯近似方法，该算法已在R语言包R-INLA中实现，旨在缓解计算负担并允许估计限制更少的多变量联合模型。模拟研究表明，与替代估计策略相比，R-INLA显著缩短了计算时间并降低了参数估计的变异性。我们进一步将该方法应用于分析一项关于原发性胆汁性胆管炎的临床试验中的5个纵向标记物（3个连续型、1个计数型、1个二元型以及16个随机效应）与死亡和移植的竞争性风险。R-INLA为将联合模型应用于健康研究中遇到的复杂多变量数据提供了一种快速且可靠的推断技术。