Machine learning has been increasingly applied in climate modeling on system emulation acceleration, data-driven parameter inference, forecasting, and knowledge discovery, addressing challenges such as physical consistency, multi-scale coupling, data sparsity, robust generalization, and integration with scientific workflows. This paper analyzes a series of case studies from applied machine learning research in climate modeling, with a focus on design choices and workflow structure. Rather than reviewing technical details, we aim to synthesize workflow design patterns across diverse projects in ML-enabled climate modeling: from surrogate modeling, ML parameterization, probabilistic programming, to simulation-based inference, and physics-informed transfer learning. We unpack how these workflows are grounded in physical knowledge, informed by simulation data, and designed to integrate observations. We aim to offer a framework for ensuring rigor in scientific machine learning through more transparent model development, critical evaluation, informed adaptation, and reproducibility, and to contribute to lowering the barrier for interdisciplinary collaboration at the interface of data science and climate modeling.
翻译:机器学习在气候建模中的应用日益广泛,涵盖系统仿真加速、数据驱动的参数推断、预测及知识发现等领域,致力于解决物理一致性、多尺度耦合、数据稀疏性、鲁棒泛化以及与科学工作流集成等挑战。本文通过分析气候建模中应用机器学习研究的一系列案例,重点探讨设计选择与工作流结构。不同于回顾技术细节,我们旨在综合机器学习赋能气候建模中多样化项目的工作流设计模式:从代理建模、机器学习参数化、概率编程、基于仿真的推断到物理信息迁移学习。我们剖析这些工作流如何基于物理知识、利用仿真数据构建,并设计为可整合观测信息。本文旨在提供一个框架,通过更透明的模型开发、严谨评估、知情适应和可复现性来确保科学机器学习的严谨性,并为降低数据科学与气候建模交叉领域的跨学科合作门槛作出贡献。