Obtaining high certainty in predictive models is crucial for making informed and trustworthy decisions in many scientific and engineering domains. However, extensive experimentation required for model accuracy can be both costly and time-consuming. This paper presents an adaptive sampling approach designed to reduce epistemic uncertainty in predictive models. Our primary contribution is the development of a metric that estimates potential epistemic uncertainty leveraging prediction interval-generation neural networks. This estimation relies on the distance between the predicted upper and lower bounds and the observed data at the tested positions and their neighboring points. Our second contribution is the proposal of a batch sampling strategy based on Gaussian processes (GPs). A GP is used as a surrogate model of the networks trained at each iteration of the adaptive sampling process. Using this GP, we design an acquisition function that selects a combination of sampling locations to maximize the reduction of epistemic uncertainty across the domain. We test our approach on three unidimensional synthetic problems and a multi-dimensional dataset based on an agricultural field for selecting experimental fertilizer rates. The results demonstrate that our method consistently converges faster to minimum epistemic uncertainty levels compared to Normalizing Flows Ensembles, MC-Dropout, and simple GPs.
翻译:在众多科学与工程领域中,获取预测模型的高确定性对于做出明智且可信的决策至关重要。然而,为提升模型精度所需的大量实验往往成本高昂且耗时。本文提出一种旨在降低预测模型中认知不确定性的自适应采样方法。我们的主要贡献是开发了一种度量指标,该指标利用预测区间生成神经网络来估计潜在的认知不确定性。此估计依赖于在测试位置及其邻近点上,预测上下界与观测数据之间的距离。我们的第二个贡献是提出了一种基于高斯过程(GPs)的批量采样策略。在自适应采样过程的每次迭代中,使用一个GP作为已训练网络的代理模型。基于此GP,我们设计了一个采集函数,用于选择一组采样位置,以最大化降低整个定义域内的认知不确定性。我们在三个一维合成问题以及一个基于农田、用于选择实验施肥量的多维数据集上测试了我们的方法。结果表明,与归一化流集成、MC-Dropout以及简单高斯过程相比,我们的方法能够以更快的速度一致收敛至最低认知不确定性水平。