Recurrent events are common in clinical, healthcare, social and behavioral studies. A recent analysis framework for potentially censored recurrent event data is to construct a censored longitudinal data set consisting of times to the first recurrent event in multiple prespecified follow-up windows of length $\tau$. With the staggering number of potential predictors being generated from genetic, -omic, and electronic health records sources, machine learning approaches such as the random forest are growing in popularity, as they can incorporate information from highly correlated predictors with non-standard relationships. In this paper, we bridge this gap by developing a random forest approach for dynamically predicting probabilities of remaining event-free during a subsequent $\tau$-duration follow-up period from a reconstructed censored longitudinal data set. We demonstrate the increased ability of our random forest algorithm for predicting the probability of remaining event-free over a $\tau$-duration follow-up period when compared to the recurrent event modeling framework of Xia et al. (2020) in settings where association between predictors and recurrent event outcomes is complex in nature. The proposed random forest algorithm is demonstrated using recurrent exacerbation data from the Azithromycin for the Prevention of Exacerbations of Chronic Obstructive Pulmonary Disease (Albert et al., 2011).
翻译:复发事件在临床、医疗保健、社会及行为研究中十分常见。针对可能经过删失的复发事件数据,一种近期的分析框架是构建一个由多个预设长度为 $\tau$ 的随访窗口中首次复发事件时间构成的删失纵向数据集。随着从遗传学、组学及电子健康记录等来源产生的潜在预测变量数量激增,随机森林等机器学习方法因其能够纳入高度相关且具有非标准关系的预测变量信息而日益受到青睐。本文通过开发一种随机森林方法,从重构的删失纵向数据集中动态预测在后续 $\tau$ 时长随访期内保持无事件的概率,从而弥合了这一空白。我们证明了在预测变量与复发事件结局之间的关联本质复杂的情况下,与 Xia 等人(2020)的复发事件建模框架相比,我们的随机森林算法在预测 $\tau$ 时长随访期内保持无事件概率方面具有更强的能力。所提出的随机森林算法通过慢性阻塞性肺疾病急性加重预防的阿奇霉素研究(Albert 等人,2011)中的复发加重数据进行了验证。