Ensemble Machine Learning and Statistical Procedures for Dynamic Predictions of Time-to-Event Outcomes

Nina van Gerwen,Sten Willemsen,Bettina E. Hansen,Christophe Corpechot,Marco Carbone,Cynthia Levy,Maria-Carlota Londõno,Atsushi Tanaka,Palak Trivedi,Alejandra Villamil,Gideon Hirschfield,Dimitris Rizopoulos

Dynamic predictions for longitudinal and time-to-event outcomes have become a versatile tool in precision medicine. Our work is motivated by the application of dynamic predictions in the decision-making process for primary biliary cholangitis patients. For these patients, serial biomarker measurements (e.g., bilirubin and alkaline phosphatase levels) are routinely collected to inform treating physicians of the risk of liver failure and guide clinical decision-making. Two popular statistical approaches to derive dynamic predictions are joint modelling and landmarking. However, recently, machine learning techniques have also been proposed. Each approach has its merits, and no single method exists to outperform all others. Consequently, obtaining the best possible survival estimates is challenging. Therefore, we extend the Super Learner framework to combine dynamic predictions from different models and procedures. Super Learner is an ensemble learning technique that allows users to combine different prediction algorithms to improve predictive accuracy and flexibility. It uses cross-validation and different objective functions of performance (e.g., squared loss) that suit specific applications to build the optimally weighted combination of predictions from a library of candidate algorithms. In our work, we pay special attention to appropriate objective functions for Super Learner to obtain the most optimal weighted combination of dynamic predictions. In our primary biliary cholangitis application, Super Learner presented unique benefits due to its ability to flexibly combine outputs from a diverse set of models with varying assumptions for equal or better predictive performance than any model fit separately.

翻译：纵向数据与时间-事件结果的动态预测已成为精准医学中的一项多功能工具。本研究受原发性胆汁性胆管炎患者临床决策过程中动态预测应用的驱动。对于此类患者，临床上常规采集系列生物标志物测量值（如胆红素与碱性磷酸酶水平），以向主治医师提示肝衰竭风险并指导临床决策。目前获得动态预测的两类主流统计方法为联合建模与界标分析。然而近年来，机器学习技术亦被提出用于该领域。各类方法皆有其优势，且不存在单一方法能始终优于其他所有方法。因此，如何获得最优生存估计仍具挑战性。为此，我们扩展了超级学习器框架以整合来自不同模型与方法的动态预测结果。超级学习器作为一种集成学习技术，允许用户组合不同预测算法以提升预测精度与灵活性。该方法通过交叉验证及适用于特定场景的多样化性能目标函数（如平方损失），从候选算法库中构建预测结果的最优加权组合。本研究特别关注适用于超级学习器的目标函数设计，以获得动态预测的最优加权组合。在原发性胆汁性胆管炎的应用案例中，超级学习器展现出独特优势——其能够灵活整合基于不同假设的多样化模型输出，在保持等同或更优预测性能的同时，超越任何单一独立拟合模型。