We study the problem of distribution shift generally arising in machine-learning augmented hybrid simulation, where parts of simulation algorithms are replaced by data-driven surrogates. We first establish a mathematical framework to understand the structure of machine-learning augmented hybrid simulation problems, and the cause and effect of the associated distribution shift. We show correlations between distribution shift and simulation error both numerically and theoretically. Then, we propose a simple methodology based on tangent-space regularized estimator to control the distribution shift, thereby improving the long-term accuracy of the simulation results. In the linear dynamics case, we provide a thorough theoretical analysis to quantify the effectiveness of the proposed method. Moreover, we conduct several numerical experiments, including simulating a partially known reaction-diffusion equation and solving Navier-Stokes equations using the projection method with a data-driven pressure solver. In all cases, we observe marked improvements in simulation accuracy under the proposed method, especially for systems with high degrees of distribution shift, such as those with relatively strong non-linear reaction mechanisms, or flows at large Reynolds numbers.
翻译:我们研究机器学习增强混合模拟中普遍出现的分布偏移问题,此类模拟中部分算法被数据驱动替代模型取代。首先建立数学框架以理解机器学习增强混合模拟问题的结构,以及相关分布偏移的成因与影响。我们通过数值与理论分析证明了分布偏移与模拟误差之间的相关性。继而提出一种基于切空间正则化估计器的简便方法以控制分布偏移,从而提升模拟结果的长期准确性。针对线性动力学情形,我们给出了完备的理论分析以量化所提方法的有效性。此外,我们开展了多项数值实验,包括模拟部分已知反应-扩散方程,以及采用投影法结合数据驱动压力求解器求解纳维-斯托克斯方程。在所有案例中,观察到所提方法显著提升模拟精度,尤其适用于存在高度分布偏移的系统(如具有较强非线性反应机制的体系或大雷诺数流动)。