Machine learning models are widely used to solve real-world problems in science and industry. To build robust models, we should quantify the uncertainty of the model's predictions on new data. This study proposes a new method for uncertainty estimation based on the surrogate Gaussian process model. Our method can equip any base model with an accurate uncertainty estimate produced by a separate surrogate. Compared to other approaches, the estimate remains computationally effective with training only one additional model and doesn't rely on data-specific assumptions. The only requirement is the availability of the base model as a black box, which is typical. Experiments for challenging time-series forecasting data show that surrogate model-based methods provide more accurate confidence intervals than bootstrap-based methods in both medium and small-data regimes and different families of base models, including linear regression, ARIMA, and gradient boosting.
翻译:机器学习模型广泛应用于科学和工业领域的实际问题解决中。为构建稳健模型,需要量化模型对新数据预测的不确定性。本研究提出了一种基于替代高斯过程模型的不确定性估计新方法。该方法可为任意基础模型配备由独立替代模型生成的精确不确定性估计。与其他方法相比,该估计在仅需训练一个额外模型的情况下保持计算高效性,且不依赖数据特定假设。唯一要求是基础模型可作为黑箱使用——这已是行业常态。针对具有挑战性的时间序列预测数据的实验表明,在中等数据量和小数据量场景下,以及涵盖线性回归、ARIMA和梯度提升等不同基础模型家族时,基于替代模型的方法能提供比基于Bootstrap的方法更精确的置信区间。