With advances in scientific computing and mathematical modeling, complex scientific phenomena such as galaxy formations and rocket propulsion can now be reliably simulated. Such simulations can however be very time-intensive, requiring millions of CPU hours to perform. One solution is multi-fidelity emulation, which uses data of different fidelities to train an efficient predictive model which emulates the expensive simulator. For complex scientific problems and with careful elicitation from scientists, such multi-fidelity data may often be linked by a directed acyclic graph (DAG) representing its scientific model dependencies. We thus propose a new Graphical Multi-fidelity Gaussian Process (GMGP) model, which embeds this DAG structure (capturing scientific dependencies) within a Gaussian process framework. We show that the GMGP has desirable modeling traits via two Markov properties, and admits a scalable algorithm for recursive computation of the posterior mean and variance along at each depth level of the DAG. We also present a novel experimental design methodology over the DAG given an experimental budget, and propose a nonlinear extension of the GMGP via deep Gaussian processes. The advantages of the GMGP are then demonstrated via a suite of numerical experiments and an application to emulation of heavy-ion collisions, which can be used to study the conditions of matter in the Universe shortly after the Big Bang. The proposed model has broader uses in data fusion applications with graphical structure, which we further discuss.
翻译:随着科学计算和数学建模的进步,诸如星系形成和火箭推进等复杂科学现象现已能够被可靠模拟。然而,此类模拟可能极为耗时,需要数百万CPU小时才能完成。一种解决方案是多保真度仿真,即利用不同保真度的数据训练高效的预测模型来替代昂贵的模拟器。针对复杂科学问题,在科学家的细致启发下,这类多保真度数据通常可通过有向无环图(DAG)建立关联,以表征其科学模型依赖性。为此,我们提出了一种新型图形化多保真度高斯过程(GMGP)模型,该模型将捕捉科学依赖关系的DAG结构嵌入高斯过程框架中。通过两种马尔可夫性质,我们证明了GMGP具有理想的建模特性,并提出了沿DAG各深度层级递归计算后验均值与方差的可扩展算法。同时,在给定实验预算条件下,我们提出了一种基于DAG的新型实验设计方法,并通过深度高斯过程给出了GMGP的非线性扩展。随后,通过一系列数值实验及用于研究宇宙大爆炸后不久物质状态的重离子碰撞模拟应用,验证了GMGP的优势。我们进一步讨论了该模型在具有图形结构的数据融合应用中的广泛潜力。