Lagrangian data assimilation aims to recover hidden Eulerian flow fields from sparse, indirect observations of moving tracers. This problem is challenging because tracer trajectories are nonlinearly coupled with the underlying flow, making posterior inference computationally intractable in realistic, high-dimensional systems. In this work, we develop a Lagrangian conditional Gaussian Koopman network (LaCGKN), a structure-preserving, data-driven framework for joint data assimilation and prediction from Lagrangian observations. LaCGKN embeds Eulerian flow dynamics into a low-dimensional latent space governed by a nonlinear stochastic system with conditional Gaussian structure, enabling analytic posterior updates without ensemble forecasting. Unlike existing conditional Gaussian Koopman formulations that assume direct Eulerian observations, the Lagrangian setting imposes additional demands on the latent representation, which must simultaneously encode the flow dynamics and mediate nonlinear tracer-flow interactions. To address these challenges, the LaCGKN incorporates three key components: (i) tracer homogenization to enforce permutation equivariance and generalize across varying numbers of tracers; (ii) Fourier positional encoding to capture spatial dependence and reconstruct local flow features at moving tracer locations; and (iii) an SVD-inspired low-rank parameterization of the latent transition operator, which reduces model complexity while retaining expressiveness. An application to a two-layer quasi-geostrophic flow with surface tracer observations shows that LaCGKN achieves accurate and efficient Lagrangian data assimilation and prediction, without reliance on ensemble methods or the governing physical model. These results establish the LaCGKN as a unified and computationally tractable alternative to both traditional model-based approaches and purely black-box data-driven methods.
翻译:拉格朗日数据同化的目标是从移动示踪粒子的稀疏间接观测中恢复隐藏的欧拉流场。该问题具有挑战性,因为示踪粒子轨迹与底层流动存在非线性耦合,使得后验推断在实际高维系统中计算不可行。本文提出拉格朗日条件高斯库普曼网络(LaCGKN),这是一种结构保持的数据驱动框架,用于从拉格朗日观测实现联合数据同化与预测。LaCGKN将欧拉流动动力学嵌入到由具有条件高斯结构的非线性随机系统支配的低维潜空间中,无需集合预报即可实现解析后验更新。与现有假设直接欧拉观测的条件高斯库普曼方法不同,拉格朗日场景对潜表示提出了额外要求:必须同时编码流动动力学并协调非线性示踪-流动相互作用。为解决这些挑战,LaCGKN包含三个关键组件:(i)示踪粒子齐次化处理,以强制置换等变性并适应不同数量的示踪粒子;(ii)傅里叶位置编码,用于捕捉空间依赖性并在移动示踪粒子位置重建局部流动特征;(iii)受奇异值分解启发的潜状态转移算子的低秩参数化,在保持表达力的同时降低模型复杂度。在具有表层示踪观测的双层准地转流中的应用表明,LaCGKN能够实现准确高效的拉格朗日数据同化与预测,且不依赖于集合方法或控制物理模型。这些结果确立了LaCGKN作为传统基于模型的方法与纯黑箱数据驱动方法的统一且计算可行的替代方案。