Large data-driven physics models like DeepMind's weather model GraphCast have empirically succeeded in parameterizing time operators for complex dynamical systems with an accuracy reaching or in some cases exceeding that of traditional physics-based solvers. Unfortunately, how these data-driven models perform computations is largely unknown and whether their internal representations are interpretable or physically consistent is an open question. Here, we adapt tools from interpretability research in Large Language Models to analyze intermediate computational layers in GraphCast, leveraging sparse autoencoders to discover interpretable features in the neuron space of the model. We uncover distinct features on a wide range of length and time scales that correspond to tropical cyclones, atmospheric rivers, diurnal and seasonal behavior, large-scale precipitation patterns, specific geographical coding, and sea-ice extent, among others. We further demonstrate how the precise abstraction of these features can be probed via interventions on the prediction steps of the model. As a case study, we sparsely modify a feature corresponding to tropical cyclones in GraphCast and observe interpretable and physically consistent modifications to evolving hurricanes. Such methods offer a window into the black-box behavior of data-driven physics models and are a step towards realizing their potential as trustworthy predictors and scientifically valuable tools for discovery.
翻译:诸如DeepMind天气模型GraphCast等大型数据驱动的物理模型,已在经验上成功实现了对复杂动力系统时间算子的参数化,其精度达到甚至在某些情况下超越了传统基于物理的求解器。然而,这些数据驱动的模型如何进行计算在很大程度上仍是未知的,其内部表征是否可解释或具有物理一致性也是一个悬而未决的问题。在此,我们借鉴大型语言模型可解释性研究中的工具,通过利用稀疏自编码器在模型的神经元空间中发现可解释特征,来分析GraphCast中的中间计算层。我们发现了跨越广泛时空尺度的多种特征,分别对应于热带气旋、大气河流、昼夜与季节行为、大尺度降水模式、特定的地理编码以及海冰范围等。我们进一步展示了如何通过对模型预测步骤进行干预,来探究这些特征的精确抽象。作为案例研究,我们稀疏地修改了GraphCast中一个对应于热带气旋的特征,并观察到对发展中的飓风产生的可解释且物理一致的修改。此类方法为窥探数据驱动物理模型的黑箱行为提供了一扇窗口,是朝着实现其作为可信赖的预测工具及具有科学价值的发现工具的潜力迈出的一步。