Distributional regression aims at estimating the conditional distribution of a targetvariable given explanatory co-variates. It is a crucial tool for forecasting whena precise uncertainty quantification is required. A popular methodology consistsin fitting a parametric model via empirical risk minimization where the risk ismeasured by the Continuous Rank Probability Score (CRPS). For independentand identically distributed observations, we provide a concentration result for theestimation error and an upper bound for its expectation. Furthermore, we considermodel selection performed by minimization of the validation error and provide aconcentration bound for the regret. A similar result is proved for convex aggregationof models. Finally, we show that our results may be applied to various models suchas Ensemble Model Output Statistics (EMOS), distributional regression networks,distributional nearest neighbors or distributional random forests and we illustrateour findings on two data sets (QSAR aquatic toxicity and Airfoil self-noise).
翻译:分布回归旨在估计给定解释协变量的目标变量的条件分布。当需要精确的不确定性量化时,它是预测的关键工具。一种流行的方法是通过经验风险最小化来拟合参数模型,其中风险由连续秩概率评分(CRPS)度量。对于独立同分布观测,我们给出了估计误差的集中性结果及其期望的上界。此外,我们考虑通过最小化验证误差进行模型选择,并给出了遗憾的集中界。对于模型的凸聚合,证明了类似的结果。最后,我们展示了我们的结果可应用于多种模型,如集成模型输出统计(EMOS)、分布回归网络、分布最近邻或分布随机森林,并在两个数据集(QSAR水生毒性和翼型自噪声)上说明了我们的发现。