Regional Climate Model Emulation with Diffusion Approaches: What is the Added Value of Generative Machine Learning?

Emulators provide a cost-effective alternative to regional climate models (RCMs) by capturing their dynamical downscaling function. They link large-scale predictors simulated by global climate models (GCMs) to RCM-simulated high-resolution fields of the target variable, here precipitation. Machine learning methods, typically deep learning, are cheaper than running RCMs in computation time and energy. Among them, generative models are appealing because they can simulate ensembles of local high-resolution fields consistent with the predictors. This ensemble, which we call the uncertainty envelope, remains to be properly assessed for added value. Here, we make three contributions. First, we introduce ParamDiffusion, a new two-stage diffusion-based framework, and compare it with a state-of-the-art diffusion approach. Second, we expand standard validation through a comprehensive framework aligned with climate-science needs, examining specific precipitation events, including extremes. Third, within this framework, we assess the added value of diffusion approaches relative to deterministic methods. We intercompare four deep-learning models: a deterministic model designed to capture the precipitation tail; a parametric probabilistic model based on it; a recently proposed diffusion approach; and ParamDiffusion, which couples the parametric model with a diffusion model. Our results show that diffusion-based approaches reproduce climatological precipitation statistics with high skill, including distributional tails and spatially compounded extremes, while generating spatially detailed fields. However, none of the assessed models consistently accounts for the most extreme RCM-simulated events within its uncertainty envelope. Diffusion models are therefore promising for probabilistic RCM emulation, but progress is still required before they can reliably represent high-impact precipitation extremes.

翻译：模拟器通过捕捉区域气候模型（RCM）的动力降尺度功能，提供了一种经济高效的替代方案。它们将全球气候模型（GCM）模拟的大尺度预测因子与RCM模拟的目标变量（此处为降水）高分辨率场相链接。机器学习方法（尤其是深度学习）在计算时间和能耗上均低于运行RCM。其中，生成式模型因其能模拟与预测因子一致的局地高分辨率场集合而具有吸引力。该集合（我们称之为不确定性包络）的附加价值仍有待系统评估。本文作出三项贡献：首先，提出基于两阶段扩散的新框架ParamDiffusion，并与前沿扩散方法进行比较；其次，通过契合气候科学需求的综合框架扩展标准验证，考察包括极端事件在内的特定降水事件；第三，在该框架内评估扩散方法相比确定性方法的附加价值。我们比较了四种深度学习模型：针对降水尾部设计的确定性模型；基于该模型构建的参数化概率模型；近期提出的扩散方法；以及耦合参数化模型与扩散模型的ParamDiffusion。结果表明，基于扩散的方法能够高技巧地再现气候态降水统计量（包括分布尾部及空间复合极端事件），同时生成空间细节丰富的场。然而，所有评估模型均未能将其不确定性包络一致覆盖RCM模拟的最极端事件。因此，扩散模型在概率性RCM模拟中具有前景，但在可靠表征高影响极端降水事件前仍需进一步突破。