Scientific modeling and engineering applications rely heavily on parameter estimation methods to fit physical models and calibrate numerical simulations using real-world measurements. In the absence of analytic statistical models with tractable likelihoods, modern simulation-based inference (SBI) methods first use a numerical simulator to generate a dataset of parameters and simulated outputs. This dataset is then used to approximate the likelihood and estimate the system parameters given observation data. Several SBI methods employ machine learning emulators to accelerate data generation and parameter estimation. However, applying these approaches to high-dimensional physical systems remains challenging due to the cost and complexity of training high-dimensional emulators. This paper introduces Embed and Emulate (E&E): a new SBI method based on contrastive learning that efficiently handles high-dimensional data and complex, multimodal parameter posteriors. E&E learns a low-dimensional latent embedding of the data (i.e., a summary statistic) and a corresponding fast emulator in the latent space, eliminating the need to run expensive simulations or a high dimensional emulator during inference. We illustrate the theoretical properties of the learned latent space through a synthetic experiment and demonstrate superior performance over existing methods in a realistic, non-identifiable parameter estimation task using the high-dimensional, chaotic Lorenz 96 system.
翻译:科学建模与工程应用高度依赖参数估计方法,通过真实世界测量数据拟合物理模型并校准数值仿真。在缺乏具有易处理似然函数的解析统计模型时,现代仿真推断方法首先利用数值模拟器生成参数与仿真输出的数据集,随后使用该数据集近似似然函数,并根据观测数据估计系统参数。现有若干仿真推断方法采用机器学习仿真器以加速数据生成与参数估计过程。然而,由于高维仿真器训练的成本与复杂性,将这些方法应用于高维物理系统仍面临挑战。本文提出嵌入与仿真方法:一种基于对比学习的新型仿真推断方法,能高效处理高维数据及复杂的多模态参数后验分布。该方法通过构建数据的低维潜在嵌入表征作为摘要统计量,并在潜在空间中训练对应的快速仿真器,从而在推断过程中避免运行昂贵的仿真计算或高维仿真器。我们通过合成实验阐释了所学潜在空间的理论特性,并在使用高维混沌Lorenz 96系统的实际非可识别参数估计任务中,证明了本方法相较于现有方案的优越性能。