Uncertainty estimation in spatial interpolation of satellite precipitation with ensemble learning

Predictions in the form of probability distributions are crucial for decision-making. Quantile regression enables this within spatial interpolation settings for merging remote sensing and gauge precipitation data. However, ensemble learning of quantile regression algorithms remains unexplored in this context. Here, we address this gap by introducing nine quantile-based ensemble learners and applying them to large precipitation datasets. We employed a novel feature engineering strategy, reducing predictors to distance-weighted satellite precipitation at relevant locations, combined with location elevation. Our ensemble learners include six stacking and three simple methods (mean, median, best combiner), combining six individual algorithms: quantile regression (QR), quantile regression forests (QRF), generalized random forests (GRF), gradient boosting machines (GBM), light gradient boosting machines (LightGBM), and quantile regression neural networks (QRNN). These algorithms serve as both base learners and combiners within different stacking methods. We evaluated performance against QR using quantile scoring functions in a large dataset comprising 15 years of monthly gauge-measured and satellite precipitation in contiguous US (CONUS). Stacking with QR and QRNN yielded the best results across quantile levels of interest (0.025, 0.050, 0.075, 0.100, 0.200, 0.300, 0.400, 0.500, 0.600, 0.700, 0.800, 0.900, 0.925, 0.950, 0.975), surpassing the reference method by 3.91% to 8.95%. This demonstrates the potential of stacking to improve probabilistic predictions in spatial interpolation and beyond.

翻译：以概率分布形式给出的预测对决策至关重要。分位数回归可在空间插值场景中实现遥感与地面观测降水数据的融合。然而，分位数回归算法的集成学习在此领域尚未得到探索。本文通过引入九种基于分位数的集成学习器并将其应用于大规模降水数据集来填补这一空白。我们采用了一种新颖的特征工程策略，将预测变量简化为相关位置的加权卫星降水距离量，并结合位置高程数据。集成学习器包括六种堆叠方法和三种简单方法（均值、中位数、最优组合），整合了六种个体算法：分位数回归（QR）、分位数回归森林（QRF）、广义随机森林（GRF）、梯度提升机（GBM）、轻量梯度提升机（LightGBM）和分位数回归神经网络（QRNN）。这些算法在不同堆叠方法中既作为基学习器也作为组合器。我们利用分位数评分函数，在包含美国本土（CONUS）15年月度地面实测与卫星降水数据的大规模数据集中，以QR为基准评估性能。在关注的分位数水平（0.025, 0.050, 0.075, 0.100, 0.200, 0.300, 0.400, 0.500, 0.600, 0.700, 0.800, 0.900, 0.925, 0.950, 0.975）上，采用QR和QRNN的堆叠方法取得了最佳结果，相比参考方法提升3.91%至8.95%。这证明了堆叠方法在空间插值及更广泛场景中改进概率预测的潜力。