The Random Forest (RF) algorithm can be applied to a broad spectrum of problems, including time series prediction. However, neither the classical IID (Independent and Identically distributed) bootstrap nor block bootstrapping strategies (as implemented in rangerts) completely account for the nature of the Data Generating Process (DGP) while resampling the observations. We propose the combination of RF with a residual bootstrapping technique where we replace the IID bootstrap with the AR-Sieve Bootstrap (ARSB), which assumes the DGP to be an autoregressive process. To assess the new model's predictive performance, we conduct a simulation study using synthetic data generated from different types of DGPs. It turns out that ARSB provides more variation amongst the trees in the forest. Moreover, RF with ARSB shows greater accuracy compared to RF with other bootstrap strategies. However, these improvements are achieved at some efficiency costs.
翻译:随机森林(RF)算法可广泛应用于包括时间序列预测在内的各类问题。然而,无论是经典的独立同分布(IID)自助法,还是块自助策略(如rangerts中的实现),在重采样观测值时都未能完全考虑数据生成过程(DGP)的本质特性。我们提出将RF与残差自助技术相结合,用AR-筛法自助法(ARSB)替代IID自助法,该方法假设DGP为自回归过程。为评估新模型的预测性能,我们使用从不同类型DGP生成的合成数据进行了仿真研究。结果表明,ARSB能在森林中的树之间产生更大的变异度。此外,采用ARSB的RF相比采用其他自助策略的RF展现出更高的预测精度。但这些改进是以一定的效率损失为代价实现的。