Merging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density and are more accurate than pure satellite precipitation products. Machine and statistical learning regression algorithms are regularly utilized in this endeavour. At the same time, tree-based ensemble algorithms are adopted in various fields for solving regression problems with high accuracy and low computational cost. Still, information on which tree-based ensemble algorithm to select for correcting satellite precipitation products for the contiguous United States (US) at the daily time scale is missing from the literature. In this study, we worked towards filling this methodological gap by conducting an extensive comparison between three algorithms of the category of interest, specifically between random forests, gradient boosting machines (gbm) and extreme gradient boosting (XGBoost). We used daily data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and the IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also used earth-observed precipitation data from the Global Historical Climatology Network daily (GHCNd) database. The experiments referred to the entire contiguous US and additionally included the application of the linear regression algorithm for benchmarking purposes. The results suggest that XGBoost is the best-performing tree-based ensemble algorithm among those compared...
翻译:通常需要通过融合卫星产品与地面测量数据,来获得既能高密度覆盖大区域、又比纯卫星降水产品更准确的降水数据集。机器学习与统计学习回归算法常被用于此类研究。与此同时,基于树的集成算法因其高精度和低计算成本而被广泛应用于各领域的回归问题中。然而,针对美国本土日尺度卫星降水产品的校正,目前文献中尚缺乏关于如何选择基于树的集成算法的指导信息。本研究通过系统比较三类典型算法——随机森林、梯度提升机(gbm)和极端梯度提升(XGBoost),致力于填补这一方法论空白。我们采用PERSIANN(基于人工神经网络的遥感信息降水估算)和IMERG(全球降水测量任务集成多卫星反演)网格化数据集的日尺度数据,同时使用了全球历史气候学网络日数据库(GHCNd)的地面实测降水数据。实验覆盖整个美国本土,并引入线性回归算法作为基准对比。结果表明,XGBoost是所比较算法中性能最优的基于树的集成算法...