Computer simulations, especially of complex phenomena, can be expensive, requiring high-performance computing resources. Often, to understand a phenomenon, multiple simulations are run, each with a different set of simulation input parameters. These data are then used to create an interpolant, or surrogate, relating the simulation outputs to the corresponding inputs. When the inputs and outputs are scalars, a simple machine learning model can suffice. However, when the simulation outputs are vector valued, available at locations in two or three spatial dimensions, often with a temporal component, creating a surrogate is more challenging. In this report, we use a two-dimensional problem of a jet interacting with high explosives to understand how we can build high-quality surrogates. The characteristics of our data set are unique - the vector-valued outputs from each simulation are available at over two million spatial locations; each simulation is run for a relatively small number of time steps; the size of the computational domain varies with each simulation; and resource constraints limit the number of simulations we can run. We show how we analyze these extremely large data-sets, set the parameters for the algorithms used in the analysis, and use simple ways to improve the accuracy of the spatio-temporal surrogates without substantially increasing the number of simulations required.
翻译:计算机模拟,尤其是针对复杂现象的模拟,可能代价高昂,需要高性能计算资源。通常,为了理解某一现象,会使用多组不同的模拟输入参数运行多次模拟。随后,这些数据被用于构建一个插值函数(即代理模型),将模拟输出与对应的输入关联起来。当输入和输出均为标量时,简单的机器学习模型即可胜任。然而,当模拟输出为向量值,且分布在二维或三维空间位置并常包含时间分量时,构建代理模型则更具挑战性。在本报告中,我们利用射流与高能炸药相互作用的二维问题,研究如何构建高质量的代理模型。我们的数据集具有独特特征——每次模拟产生的向量值输出分布在超过两百万个空间位置上;每次模拟运行的时间步数相对较少;计算域的大小随模拟变化;同时,资源限制约束了可执行的模拟次数。我们展示了如何分析这些超大规模数据集,如何设定算法参数,以及如何在不显著增加所需模拟次数的情况下,通过简单方法提升时空代理模型的精度。