Ensemble weather forecasts based on multiple runs of numerical weather prediction models typically show systematic errors and require post-processing to obtain reliable forecasts. Accurately modeling multivariate dependencies is crucial in many practical applications, and various approaches to multivariate post-processing have been proposed where ensemble predictions are first post-processed separately in each margin and multivariate dependencies are then restored via copulas. These two-step methods share common key limitations, in particular the difficulty to include additional predictors in modeling the dependencies. We propose a novel multivariate post-processing method based on generative machine learning to address these challenges. In this new class of nonparametric data-driven distributional regression models, samples from the multivariate forecast distribution are directly obtained as output of a generative neural network. The generative model is trained by optimizing a proper scoring rule which measures the discrepancy between the generated and observed data, conditional on exogenous input variables. Our method does not require parametric assumptions on univariate distributions or multivariate dependencies and allows for incorporating arbitrary predictors. In two case studies on multivariate temperature and wind speed forecasting at weather stations over Germany, our generative model shows significant improvements over state-of-the-art methods and particularly improves the representation of spatial dependencies.
翻译:基于多次数值天气预报模型运行的集合天气预报通常存在系统性误差,需要经过后处理才能获得可靠的预报结果。精确建模多元依赖关系在许多实际应用中至关重要,目前已有多种多元后处理方法被提出,这些方法首先对集合预报的每个边缘分布分别进行后处理,然后通过Copula函数恢复多元依赖关系。这类两步法存在共同的关键局限性,尤其是在依赖关系建模中难以纳入附加预测因子。针对这些问题,我们提出了一种基于生成式机器学习的全新多元后处理方法。在这类非参数数据驱动分布回归模型中,多元预报分布的样本可直接通过生成式神经网络的输出获得。通过优化衡量生成数据与观测数据之间差异的恰当评分规则,该生成模型在条件于外生输入变量的情况下完成训练。我们的方法无需对单变量分布或多元依赖关系做参数假设,且允许纳入任意预测因子。在针对德国气象站多元温度和风速预报的两个案例研究中,我们的生成模型相较于现有最优方法展现出显著改进,尤其提升了空间依赖关系的表征能力。