Data assimilation (DA) is a cornerstone of scientific and engineering applications, combining model forecasts with sparse and noisy observations to estimate latent system states. Classical high-dimensional DA methods, such as the ensemble Kalman filter, rely on Gaussian approximations that are violated for complex dynamics or observation operators. To address this limitation, we introduce DAISI, a scalable filtering algorithm built on flow-based generative models that enables flexible probabilistic inference using data-driven priors. The core idea is to use a stationary, pre-trained generative prior that first incorporates forecast information through a novel inverse-sampling step, before assimilating observations via guidance-based conditional sampling. This allows us to leverage any forecasting model as part of the DA pipeline without having to retrain or fine-tune the generative prior at each assimilation step. Experiments on challenging nonlinear systems show that DAISI achieves accurate filtering results in regimes with sparse, noisy, and nonlinear observations where traditional methods struggle. The code for DAISI is available at https://github.com/Erik-Wikingsson/DAISI.
翻译:数据同化是科学与工程应用的基石,通过结合模型预测与稀疏含噪观测来估计潜在系统状态。经典的高维数据同化方法(如集合卡尔曼滤波)依赖高斯近似,但复杂动态系统或观测算子往往违背该假设。为此,我们提出DAISI——一种基于流生成模型的可扩展滤波算法,通过数据驱动先验实现灵活的概率推理。其核心思想是:首先利用静态预训练的生成先验,通过新颖的逆采样步骤融入预报信息,再通过引导条件采样同化观测数据。该方法无需在每个同化步骤中重新训练或微调生成先验,即可将任意预报模型纳入数据同化流程。在挑战性非线性系统上的实验表明,DAISI能在传统方法难以应对的稀疏、含噪及非线性观测场景中取得精确滤波结果。DAISI代码开源地址:https://github.com/Erik-Wikingsson/DAISI。