Gaussian processes (GPs) are a Bayesian machine learning approach widely used to construct surrogate models for the uncertainty quantification of computer simulation codes in industrial applications. It provides both a mean predictor and an estimate of the posterior prediction variance, the latter being used to produce Bayesian credibility intervals. Interpreting these intervals relies on the Gaussianity of the simulation model as well as the well-specification of the priors which are not always appropriate. We propose to address this issue with the help of conformal prediction. In the present work, a method for building adaptive cross-conformal prediction intervals is proposed by weighting the non-conformity score with the posterior standard deviation of the GP. The resulting conformal prediction intervals exhibit a level of adaptivity akin to Bayesian credibility sets and display a significant correlation with the surrogate model local approximation error, while being free from the underlying model assumptions and having frequentist coverage guarantees. These estimators can thus be used for evaluating the quality of a GP surrogate model and can assist a decision-maker in the choice of the best prior for the specific application of the GP. The performance of the method is illustrated through a panel of numerical examples based on various reference databases. Moreover, the potential applicability of the method is demonstrated in the context of surrogate modeling of an expensive-to-evaluate simulator of the clogging phenomenon in steam generators of nuclear reactors.
翻译:贝叶斯机器学习方法——高斯过程(GPs)被广泛用于构建工业应用中计算机模拟代码不确定性量化的代理模型。该方法提供均值预测器和后验预测方差估计,后者用于生成贝叶斯置信区间。这些区间的解释依赖于模拟模型的高斯性以及先验设定的合理性,而这两者并非总是成立。本文提出借助保形预测来解决该问题。通过使用GP的后验标准差对非一致性得分进行加权,提出一种构建自适应交叉保形预测区间的方法。所得保形预测区间具有与贝叶斯置信集相似的自适应水平,并与代理模型局部逼近误差显著相关,同时摆脱了底层模型假设的束缚且具备频率学派覆盖保证。这些估计量可用于评估GP代理模型的质量,并协助决策者为特定应用选择最佳先验。基于多种参考数据库的数值算例验证了该方法的性能。此外,在核反应堆蒸汽发生器堵塞现象的高成本评估模拟器代理建模场景中,展示了该方法的应用潜力。