MF-GLaM: A multifidelity stochastic emulator using generalized lambda models

Stochastic simulators exhibit intrinsic stochasticity due to unobservable, uncontrollable, or unmodeled input variables, resulting in random outputs even at fixed input conditions. Such simulators are common across various scientific disciplines; however, emulating their entire conditional probability distribution is challenging, as it is a task traditional deterministic surrogate modeling techniques are not designed for. Additionally, accurately characterizing the response distribution can require prohibitively large datasets, especially for computationally expensive high-fidelity (HF) simulators. When lower-fidelity (LF) stochastic simulators are available, they can enhance limited HF information within a multifidelity surrogate modeling (MFSM) framework. While MFSM techniques are well-established for deterministic settings, constructing multifidelity emulators to predict the full conditional response distribution of stochastic simulators remains a challenge. In this paper, we propose multifidelity generalized lambda models (MF-GLaMs) to efficiently emulate the conditional response distribution of HF stochastic simulators by exploiting data from LF stochastic simulators. Our approach builds upon the generalized lambda model (GLaM), which represents the conditional distribution at each input by a flexible, four-parameter generalized lambda distribution. MF-GLaMs are non-intrusive, requiring no access to the internal stochasticity of the simulators nor multiple replications of the same input values. We demonstrate the efficacy of MF-GLaM through synthetic examples of increasing complexity and a realistic earthquake application. Results show that MF-GLaMs can achieve improved accuracy at the same cost as single-fidelity GLaMs, or comparable performance at significantly reduced cost.

翻译：随机仿真器因存在不可观测、不可控或未建模的输入变量而具有内在随机性，即使在固定输入条件下也会产生随机输出。这类仿真器在多个科学领域普遍存在，然而对其整个条件概率分布进行代理建模极具挑战，因为传统确定性代理建模技术并非为此类任务设计。此外，精确表征响应分布可能需要规模过大的数据集，尤其是对于计算昂贵的高保真度仿真器而言。当低保真度随机仿真器可用时，它们能在多保真度代理建模框架下增强有限的高保真度信息。尽管多保真度代理建模技术在确定性场景中已相当成熟，但构建能够预测随机仿真器完整条件响应分布的多保真度代理模型仍是一项难题。本文提出多保真度广义lambda模型，通过利用低保真度随机仿真器的数据来高效模拟高保真度随机仿真器的条件响应分布。我们的方法基于广义lambda模型，该模型通过灵活的四参数广义lambda分布表征每个输入条件下的条件分布。MF-GLaM具有非侵入性，既无需访问仿真器的内部随机机制，也无需对相同输入值进行多次重复采样。我们通过复杂度递增的合成算例及一个真实地震应用验证了MF-GLaM的有效性。结果表明，与单保真度GLaM相比，MF-GLaM能在相同计算成本下实现更高精度，或以显著降低的成本达到可比性能。