Scalable surrogate models enable efficient emulation of computer models (or simulators), particularly when dealing with large ensembles of runs. While Gaussian Process (GP) models are commonly employed for emulation, they face limitations in scaling to truly large datasets. Furthermore, when dealing with dense functional output, such as spatial or time-series data, additional complexities arise, requiring careful handling to ensure fast emulation. This work presents a highly scalable emulator for functional data, building upon the works of Kennedy and O'Hagan (2001) and Higdon et al. (2008), while incorporating the local approximate Gaussian Process framework proposed by Gramacy and Apley (2015). The emulator utilizes global GP lengthscale parameter estimates to scale the input space, leading to a substantial improvement in prediction speed. We demonstrate that our fast approximation-based emulator can serve as a viable alternative to the methods outlined in Higdon et al. (2008) for functional response, while drastically reducing computational costs. The proposed emulator is applied to quickly calibrate the multiphysics continuum hydrodynamics simulator FLAG with a large ensemble of 20000 runs. The methods presented are implemented in the R package FlaGP.
翻译:可扩展的代理模型能够有效模拟计算机模型(或仿真器),特别是在处理大规模运行集合时。虽然高斯过程(GP)模型常用于仿真建模,但其在处理真正大规模数据集时存在可扩展性限制。此外,当涉及密集的函数输出(如空间或时间序列数据)时,会产生额外的复杂性,需要谨慎处理以确保快速仿真。本研究基于Kennedy与O'Hagan(2001)和Higdon等人(2008)的工作,结合Gramacy与Apley(2015)提出的局部近似高斯过程框架,提出了一种高度可扩展的函数数据仿真器。该仿真器利用全局GP长度尺度参数估计对输入空间进行缩放,从而显著提升预测速度。我们证明,基于快速近似的仿真器可作为Higdon等人(2008)提出的函数响应方法的有效替代方案,同时大幅降低计算成本。所提出的仿真器被应用于快速校准多物理场连续介质流体动力学模拟器FLAG,其中包含20000次运行的大规模集合。相关方法已在R软件包FlaGP中实现。