The importance of accurately quantifying forecast uncertainty has motivated much recent research on probabilistic forecasting. In particular, a variety of deep learning approaches has been proposed, with forecast distributions obtained as output of neural networks. These neural network-based methods are often used in the form of an ensemble, e.g., based on multiple model runs from different random initializations or more sophisticated ensembling strategies such as dropout, resulting in a collection of forecast distributions that need to be aggregated into a final probabilistic prediction. With the aim of consolidating findings from the machine learning literature on ensemble methods and the statistical literature on forecast combination, we address the question of how to aggregate distribution forecasts based on such `deep ensembles'. Using theoretical arguments and a comprehensive analysis on twelve benchmark data sets, we systematically compare probability- and quantile-based aggregation methods for three neural network-based approaches with different forecast distribution types as output. Our results show that combining forecast distributions from deep ensembles can substantially improve the predictive performance. We propose a general quantile aggregation framework for deep ensembles that allows for corrections of systematic deficiencies and performs well in a variety of settings, often superior compared to a linear combination of the forecast densities. Finally, we investigate the effects of the ensemble size and derive recommendations of aggregating distribution forecasts from deep ensembles in practice.
翻译:准确量化预测不确定性的重要性推动了近期关于概率预测的大量研究。特别地,多种深度学习方法被提出,其预测分布通过神经网络输出获得。这些基于神经网络的方法通常以集成形式使用,例如基于不同随机初始化的多次模型运行,或采用更复杂的集成策略(如dropout),从而产生一组需要聚合为最终概率预测的分布预测。为整合机器学习文献中关于集成方法和统计学文献中关于预测组合的研究成果,我们探讨了如何基于此类"深度集成"来聚合分布预测。通过理论论证和对十二个基准数据集的综合分析,我们系统比较了三种基于神经网络的预测分布输出方法中基于概率和基于分位数的聚合策略。研究结果表明,整合深度集成中的预测分布能显著提升预测性能。我们提出了一个通用的深度集成分位数聚合框架,该框架能够修正系统性缺陷,并在多种设置下表现优异,通常优于预测密度的线性组合。最后,我们研究了集成规模的影响,并提出了实践中聚合深度集成分布预测的实施建议。