The R package DynForest implements random forests for predicting a continuous, a categorical or a (multiple causes) time-to-event outcome based on time-fixed and time-dependent predictors. The main originality of DynForest is that it handles time-dependent predictors that can be endogeneous (i.e., impacted by the outcome process), measured with error and measured at subject-specific times. At each recursive step of the tree building process, the time-dependent predictors are internally summarized into individual features on which the split can be done. This is achieved using flexible linear mixed models (thanks to the R package lcmm) which specification is pre-specified by the user. DynForest returns the mean for continuous outcome, the category with a majority vote for categorical outcome or the cumulative incidence function over time for survival outcome. DynForest also computes variable importance and minimal depth to inform on the most predictive variables or groups of variables. This paper aims to guide the user with step-by-step examples for fitting random forests using DynForest.
翻译:R包DynForest实现了基于固定时间与时间依赖预测变量的随机森林,用于预测连续型、分类型或(多原因)时间至事件结局。DynForest的核心创新在于能处理内源性(即受结局过程影响)、存在测量误差且观测时间点因个体而异的时间依赖预测变量。在树的递归构建过程中,时间依赖预测变量会被内部归纳为个体特征,并据此进行节点分裂。这一过程通过灵活的线性混合模型(依托R包lcmm)实现,其模型设定由用户预先指定。DynForest可输出连续型结局的均值、分类型结局的多数投票类别,或生存结局的累积发生率函数随时间变化曲线。该包还能计算变量重要性和最小深度,以识别最具预测能力的变量或变量组。本文旨在通过逐步示例指导用户使用DynForest拟合随机森林模型。