Regression algorithms are regularly used for improving the accuracy of satellite precipitation products. In this context, satellite precipitation and topography data are the predictor variables, and gauged-measured precipitation data are the dependent variables. Alongside this, it is increasingly recognised in many fields that combinations of algorithms through ensemble learning can lead to substantial predictive performance improvements. Still, a sufficient number of ensemble learners for improving the accuracy of satellite precipitation products and their large-scale comparison are currently missing from the literature. In this study, we work towards filling in this specific gap by proposing 11 new ensemble learners in the field and by extensively comparing them. We apply the ensemble learners to monthly data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets that span over a 15-year period and over the entire the contiguous United States (CONUS). We also use gauge-measured precipitation data from the Global Historical Climatology Network monthly database, version 2 (GHCNm). The ensemble learners combine the predictions of six machine learning regression algorithms (base learners), namely the multivariate adaptive regression splines (MARS), multivariate adaptive polynomial splines (poly-MARS), random forests (RF), gradient boosting machines (GBM), extreme gradient boosting (XGBoost) and Bayesian regularized neural networks (BRNN), and each of them is based on a different combiner. The combiners include the equal-weight combiner, the median combiner, two best learners and seven variants of a sophisticated stacking method. The latter stacks a regression algorithm on top of the base learners to combine their independent predictions...
翻译:回归算法常被用于提高卫星降水产品的精度。在此背景下,卫星降水数据和地形数据作为预测变量,雨量计实测降水数据作为因变量。与此同时,许多领域日益认识到通过集成学习组合不同算法可显著提升预测性能。然而,现有文献中尚缺乏足够数量的用于提高卫星降水产品精度的集成学习器及其大规模比较研究。本研究通过提出该领域11种新型集成学习器并进行广泛对比,致力于填补这一空白。我们将这些集成学习器应用于PERSIANN(基于人工神经网络的遥感信息降水估计)和IMERG(GPM多卫星集成反演)格点数据集中的月度数据,这些数据覆盖美国本土(CONUS)15年时期。同时采用全球历史气候网络月度数据库第二版(GHCNm)的雨量计实测降水数据。这些集成学习器组合了六种机器学习回归算法(基学习器)的预测结果,即多元自适应回归样条(MARS)、多元自适应多项式样条(poly-MARS)、随机森林(RF)、梯度提升机(GBM)、极端梯度提升(XGBoost)和贝叶斯正则化神经网络(BRNN),每个集成学习器采用不同的组合器。组合器包括等权组合器、中位数组合器、两种最佳学习器以及七种复杂堆叠方法的变体。后者通过在基学习器之上堆叠回归算法以组合其独立预测结果...