Random Forest (RF) is a well-known data-driven algorithm applied in several fields thanks to its flexibility in modeling the relationship between the response variable and the predictors, also in case of strong non-linearities. In environmental applications, it often occurs that the phenomenon of interest may present spatial and/or temporal dependence that is not taken explicitly into account by RF in its standard version. In this work, we propose a taxonomy to classify strategies according to when (Pre-, In- and/or Post-processing) they try to include the spatial information into regression RF. Moreover, we provide a systematic review and classify the most recent strategies adopted to "adjust" regression RF to spatially dependent data, based on the criteria provided by the Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA). The latter consists of a reproducible methodology for collecting and processing existing literature on a specified topic from different sources. PRISMA starts with a query and ends with a set of scientific documents to review: we performed an online query on the 25$^{th}$ October 2022 and, in the end, 32 documents were considered for review. The employed methodological strategies and the application fields considered in the 32 scientific documents are described and discussed.
翻译:随机森林(Random Forest, RF)是一种广为人知的数据驱动算法,因其在建模响应变量与预测变量之间关系(即使在强非线性情况下)时的灵活性而被广泛应用于多个领域。在环境应用中,所关注的现象常呈现空间和/或时间依赖性,而标准版本的RF并未明确考虑这种依赖性。本研究提出一种分类法,根据将空间信息纳入回归RF的时间节点(预处理、处理中和/或后处理)对相关策略进行分类。此外,我们基于系统综述与元分析首选报告条目(PRISMA)提供的标准,对现有用于“调整”回归RF以适应空间依赖数据的最新策略进行了系统综述和分类。PRISMA是一种可重复的方法论,用于从不同来源收集和处理特定主题的现有文献:它始于查询,最终形成一组待综述的科学文献。我们于2022年10月25日执行在线查询,最终筛选出32篇待综述文献。本文对32篇文献所采用的方法论策略及其应用领域进行了描述与讨论。