Data assimilation, in its most comprehensive form, addresses the Bayesian inverse problem of identifying plausible state trajectories that explain noisy or incomplete observations of stochastic dynamical systems. Various approaches have been proposed to solve this problem, including particle-based and variational methods. However, most algorithms depend on the transition dynamics for inference, which becomes intractable for long time horizons or for high-dimensional systems with complex dynamics, such as oceans or atmospheres. In this work, we introduce score-based data assimilation for trajectory inference. We learn a score-based generative model of state trajectories based on the key insight that the score of an arbitrarily long trajectory can be decomposed into a series of scores over short segments. After training, inference is carried out using the score model, in a non-autoregressive manner by generating all states simultaneously. Quite distinctively, we decouple the observation model from the training procedure and use it only at inference to guide the generative process, which enables a wide range of zero-shot observation scenarios. We present theoretical and empirical evidence supporting the effectiveness of our method.
翻译:数据同化,在其最全面的形式中,旨在解决贝叶斯逆问题,即识别能够解释随机动力系统中含噪声或不完整观测的合理状态轨迹。人们已提出多种方法来解决该问题,包括基于粒子的方法和变分方法。然而,大多数算法依赖于用于推理的转移动力学,这在长时间跨度或具有复杂动力学(如海洋或大气)的高维系统中变得难以处理。本文提出了一种基于分数的数据同化方法用于轨迹推理。我们基于一个关键见解——任意长轨迹的分数可分解为一系列短区间上的分数——学习了一个状态轨迹的基于分数的生成模型。训练完成后,推理以非自回归方式通过生成所有状态同时进行,并仅使用分数模型。特别地,我们将观测模型与训练过程解耦,仅在推理时用于引导生成过程,从而支持广泛的零样本观测场景。我们提供了支持该方法有效性的理论和实证证据。