Recurrent events are common in clinical, healthcare, social and behavioral studies. A recent analysis framework for potentially censored recurrent event data is to construct a censored longitudinal data set consisting of times to the first recurrent event in multiple prespecified follow-up windows of length $\tau$. With the staggering number of potential predictors being generated from genetic, -omic, and electronic health records sources, machine learning approaches such as the random forest are growing in popularity, as they can incorporate information from highly correlated predictors with non-standard relationships. In this paper, we bridge this gap by developing a random forest approach for dynamically predicting probabilities of remaining event-free during a subsequent $\tau$-duration follow-up period from a reconstructed censored longitudinal data set. We demonstrate the increased ability of our random forest algorithm for predicting the probability of remaining event-free over a $\tau$-duration follow-up period when compared to the recurrent event modeling framework of Xia et al. (2020) in settings where association between predictors and recurrent event outcomes is complex in nature. The proposed random forest algorithm is demonstrated using recurrent exacerbation data from the Azithromycin for the Prevention of Exacerbations of Chronic Obstructive Pulmonary Disease (Albert et al., 2011).
翻译:复发事件在临床、医疗、社会及行为研究中普遍存在。针对可能被截断的复发事件数据,一种新兴分析框架通过构建包含多个预设随访窗口(长度为$\tau$)内首次复发事件时间的截断纵向数据集。随着基因、组学及电子健康记录等来源产生的潜在预测变量数量激增,随机森林等机器学习方法因能够整合高度相关且具有非标准关系的预测变量信息而日益流行。本文通过开发一种基于重构截断纵向数据集的随机森林方法,填补了上述研究空白,该方法可动态预测在后续时长为$\tau$的随访期内保持无事件状态的概率。我们证明,在预测变量与复发事件结果之间存在复杂关联的场景下,与Xia等人(2020)的复发事件建模框架相比,本文提出的随机森林算法在预测$\tau$时长随访期内无事件生存概率方面具有更优性能。该算法通过阿奇霉素预防慢性阻塞性肺病急性加重研究(Albert等人,2011)中的复发急性加重数据进行了实证验证。