Robust integration of physical knowledge and data is key to improve computational simulations, such as Earth system models. Data assimilation is crucial for achieving this goal because it provides a systematic framework to calibrate model outputs with observations, which can include remote sensing imagery and ground station measurements, with uncertainty quantification. Conventional methods, including Kalman filters and variational approaches, inherently rely on simplifying linear and Gaussian assumptions, and can be computationally expensive. Nevertheless, with the rapid adoption of data-driven methods in many areas of computational sciences, we see the potential of emulating traditional data assimilation with deep learning, especially generative models. In particular, the diffusion-based probabilistic framework has large overlaps with data assimilation principles: both allows for conditional generation of samples with a Bayesian inverse framework. These models have shown remarkable success in text-conditioned image generation or image-controlled video synthesis. Likewise, one can frame data assimilation as observation-conditioned state calibration. In this work, we propose SLAMS: Score-based Latent Assimilation in Multimodal Setting. Specifically, we assimilate in-situ weather station data and ex-situ satellite imagery to calibrate the vertical temperature profiles, globally. Through extensive ablation, we demonstrate that SLAMS is robust even in low-resolution, noisy, and sparse data settings. To our knowledge, our work is the first to apply deep generative framework for multimodal data assimilation using real-world datasets; an important step for building robust computational simulators, including the next-generation Earth system models. Our code is available at: https://github.com/yongquan-qu/SLAMS
翻译:物理知识与数据的稳健整合是改进计算模拟(如地球系统模型)的关键。数据同化对于实现这一目标至关重要,因为它提供了一个系统框架,可通过不确定性量化将遥感影像和地面站观测数据与模型输出进行校准。传统方法(如卡尔曼滤波和变分方法)本质上依赖于简化的线性和高斯假设,且计算成本高昂。然而,随着数据驱动方法在计算科学领域的快速普及,我们看到了利用深度学习(特别是生成模型)模拟传统数据同化的潜力。基于扩散的概率框架与数据同化原理存在高度重叠:两者均允许通过贝叶斯逆框架进行条件样本生成。这类模型在文本条件图像生成或图像控制视频合成中取得了显著成功。同样地,可将数据同化视为观测条件的状态校准。本文提出SLAMS:多模态设置下基于分数的潜在同化模型。具体而言,我们同化现场气象站数据与非现场卫星影像,以全球范围内校准垂直温度廓线。通过广泛消融实验证明,即使面临低分辨率、含噪声和稀疏数据场景,SLAMS仍保持稳健性。据我们所知,本文首次将深度生成框架应用于基于真实数据集的多模态数据同化,为构建包括下一代地球系统模型在内的稳健计算模拟器迈出了重要一步。代码开源地址:https://github.com/yongquan-qu/SLAMS