Data assimilation (DA) is a fundamental component of modern weather prediction, yet it remains a major computational bottleneck in machine learning (ML)-based forecasting pipelines due to reliance on traditional variational methods. Recent generative ML-based DA methods offer a promising alternative but typically require many sampling steps and suffer from error accumulation under long-horizon auto-regressive rollouts with cycling assimilation. We propose FlowDA, a low-latency weather-scale generative DA framework based on flow matching. FlowDA conditions on observations through a SetConv-based embedding and fine-tunes the Aurora foundation model to deliver accurate, efficient, and robust analyses. Experiments across observation rates decreasing from $3.9\%$ to $0.1\%$ demonstrate superior performance of FlowDA over strong baselines with similar tunable-parameter size. FlowDA further shows robustness to observational noise and stable performance in long-horizon auto-regressive cycling DA. Overall, FlowDA points to an efficient and scalable direction for data-driven DA.
翻译:数据同化(DA)是现代天气预报的基础组成部分,但由于依赖传统的变分方法,它仍然是基于机器学习(ML)的预报流程中的主要计算瓶颈。近期基于生成式ML的DA方法提供了一种有前景的替代方案,但通常需要大量采样步骤,并且在具有循环同化的长时域自回归推演中易受误差累积的影响。我们提出了FlowDA,一个基于流匹配的低延迟天气尺度生成式DA框架。FlowDA通过基于SetConv的嵌入对观测进行条件化,并对Aurora基础模型进行微调,以提供准确、高效且稳健的分析。在观测率从$3.9\%$降至$0.1\%$的多种实验条件下,FlowDA均表现出优于具有相似可调参数规模的强基线模型的性能。FlowDA进一步显示出对观测噪声的鲁棒性,以及在长时域自回归循环DA中的稳定性能。总体而言,FlowDA为数据驱动的DA指明了一条高效且可扩展的方向。