Latent diffusion models (LDMs) suffer from limited diffusability in high-resolution (<=0.25°) ensemble weather forecasting, where diffusability characterizes how easily a latent data distribution can be modeled by a diffusion process. Unlike natural image fields, meteorological fields lack task-agnostic foundation models and explicit semantic structures, making VFM-based regularization inapplicable. Moreover, existing frequency-based approaches impose identical spectral regularization across channels under a homogeneity assumption, which leads to uneven regularization strength under the inter-variable spectral heterogeneity in multivariate meteorological data. To address these challenges, we propose a 3D Masked AutoEncoder (3D-MAE) that encodes weather-state evolution features as an additional conditioning for the diffusion model, together with a Variable-Aware Masked Frequency Modeling (VA-MFM) strategy that adaptively selects thresholds based on the spectral energy distribution of each variable. Together, we propose PuYun-LDM, which enhances latent diffusability and achieves superior performance to ENS at short lead times while remaining comparable to ENS at longer horizons. PuYun-LDM generates a 15-day global forecast with a 6-hour temporal resolution in five minutes on a single NVIDIA H200 GPU, while ensemble forecasts can be efficiently produced in parallel.
翻译:潜在扩散模型(LDMs)在高分辨率(<=0.25°)集合天气预报中面临扩散能力有限的问题,扩散能力表征了扩散过程对潜在数据分布建模的难易程度。与自然图像领域不同,气象场缺乏任务无关的基础模型和明确的语义结构,使得基于视觉基础模型的规整化方法不适用。此外,现有的基于频率的方法在同质性假设下对所有通道施加相同的谱规整化,这导致在多元气象数据的变量间谱异质性下,规整化强度不均衡。为应对这些挑战,我们提出了一种三维掩码自编码器(3D-MAE),用于编码天气状态演变特征,作为扩散模型的额外条件输入;同时提出了一种变量感知掩码频率建模(VA-MFM)策略,该策略根据每个变量的谱能量分布自适应地选择阈值。基于此,我们提出了PuYun-LDM,它增强了潜在扩散能力,在短预报时效上性能优于ENS,同时在长预报时效上仍与ENS相当。PuYun-LDM在单个NVIDIA H200 GPU上,可在五分钟内生成时间分辨率为6小时的15天全球预报,而集合预报可以高效地并行生成。