With advances in scientific computing and mathematical modeling, complex scientific phenomena such as galaxy formations and rocket propulsion can now be reliably simulated. Such simulations can however be very time-intensive, requiring millions of CPU hours to perform. One solution is multi-fidelity emulation, which uses data of different fidelities to train an efficient predictive model which emulates the expensive simulator. For complex scientific problems and with careful elicitation from scientists, such multi-fidelity data may often be linked by a directed acyclic graph (DAG) representing its scientific model dependencies. We thus propose a new Graphical Multi-fidelity Gaussian Process (GMGP) model, which embeds this DAG structure (capturing scientific dependencies) within a Gaussian process framework. We show that the GMGP has desirable modeling traits via two Markov properties, and admits a scalable algorithm for recursive computation of the posterior mean and variance along at each depth level of the DAG. We also present a novel experimental design methodology over the DAG given an experimental budget, and propose a nonlinear extension of the GMGP via deep Gaussian processes. The advantages of the GMGP are then demonstrated via a suite of numerical experiments and an application to emulation of heavy-ion collisions, which can be used to study the conditions of matter in the Universe shortly after the Big Bang. The proposed model has broader uses in data fusion applications with graphical structure, which we further discuss.
翻译:随着科学计算与数学建模的进步,星系形成、火箭推进等复杂科学现象现已能够被可靠模拟。然而,这类模拟可能极为耗时,需要数百万CPU小时才能完成。一种解决方案是多保真度仿真,它利用不同保真度的数据训练高效的预测模型,以替代昂贵的模拟器。对于复杂的科学问题,通过仔细征求科学家意见,这类多保真度数据通常可由一个有向无环图(DAG)连接,该图表征了其科学模型依赖关系。为此,我们提出一种新的图形化多保真度高斯过程(GMGP)模型,该模型将这种捕获科学依赖关系的DAG结构嵌入到高斯过程框架中。我们通过两个马尔可夫性质证明GMGP具有理想的建模特性,并提出一种可扩展算法,用于沿DAG每个深度层次递归计算后验均值和方差。此外,我们提出了一种在给定实验预算下基于DAG的新颖实验设计方法,并通过深度高斯过程给出GMGP的非线性扩展。随后,通过一系列数值实验及重离子碰撞仿真应用(该仿真可用于研究宇宙大爆炸后不久的物质状态)展示了GMGP的优势。本文进一步讨论了该模型在具有图形结构的数据融合应用中的更广泛用途。