Predictions and forecasts of machine learning models should take the form of probability distributions, aiming to increase the quantity of information communicated to end users. Although applications of probabilistic prediction and forecasting with machine learning models in academia and industry are becoming more frequent, related concepts and methods have not been formalized and structured under a holistic view of the entire field. Here, we review the topic of predictive uncertainty estimation with machine learning algorithms, as well as the related metrics (consistent scoring functions and proper scoring rules) for assessing probabilistic predictions. The review covers a time period spanning from the introduction of early statistical (linear regression and time series models, based on Bayesian statistics or quantile regression) to recent machine learning algorithms (including generalized additive models for location, scale and shape, random forests, boosting and deep learning algorithms) that are more flexible by nature. The review of the progress in the field, expedites our understanding on how to develop new algorithms tailored to users' needs, since the latest advancements are based on some fundamental concepts applied to more complex algorithms. We conclude by classifying the material and discussing challenges that are becoming a hot topic of research.
翻译:机器学习模型的预测应以概率分布的形式呈现,旨在增加向最终用户传递的信息量。尽管在学术界和工业界,基于机器学习模型的概率预测应用日益频繁,但相关概念与方法尚未在全局视角下得到体系化的规范与整合。本文综述了基于机器学习算法的预测不确定性估计,以及用于评估概率预测的相关度量(一致评分函数与适当评分规则)。本综述的时间跨度从早期统计模型(基于贝叶斯统计或分位回归的线性回归与时间序列模型)延伸至近期更具灵活性的机器学习算法(包括位置、尺度与形状的广义可加模型、随机森林、提升算法与深度学习算法)。通过梳理该领域的研究进展,我们能够加深对如何根据用户需求开发新型算法的理解——最新进展均基于某些基础概念在更复杂算法中的应用。最后,我们通过分类整理相关文献并讨论当前研究热点所面临的挑战作为总结。