Machine learning models for the global atmosphere that are capable of producing stable, multi-year simulations of Earth's climate have recently been developed. However, the ability of these ML models to generalize beyond the training distribution remains an open question. In this study, we evaluate the climate response of several state-of-the-art ML models (ACE2-ERA5, NeuralGCM, and cBottle) to a uniform sea surface temperature warming, a widely used benchmark for evaluating climate change. We assess each ML model's performance relative to a physics-based general circulation model (NOAA's Geophysical Fluid Dynamics Laboratory AM4) across key diagnostics, including surface air temperature, precipitation, temperature and wind profiles, and top-of-atmosphere radiation. While the ML models reproduce key aspects of the physical model response, particularly the response of precipitation, some exhibit notable departures from robust physical responses, including radiative responses and land region warming. Our results highlight the promise and current limitations of ML models for climate change applications and suggest that further improvements are needed for robust out-of-sample generalization.
翻译:近年来,已开发出能够稳定模拟地球气候多年变化的全球大气机器学习模型。然而,这些机器学习模型在训练分布之外的泛化能力仍是一个悬而未决的问题。本研究评估了若干前沿机器学习模型(ACE2-ERA5、NeuralGCM 和 cBottle)对均匀海表温度增暖的气候响应——该基准被广泛用于评估气候变化。我们通过关键诊断指标(包括地表气温、降水、温度与风场垂直廓线以及大气顶辐射)将各机器学习模型的性能与基于物理的全球环流模型(NOAA 地球物理流体动力学实验室 AM4)进行对比。虽然机器学习模型复现了物理模型响应的关键特征(特别是降水响应),但部分模型在辐射响应及陆地增温等稳健物理响应方面仍存在显著偏差。我们的研究结果既凸显了机器学习模型在气候变化应用中的潜力,也揭示了其当前局限性,表明需要进一步改进以实现稳健的样本外泛化能力。