Earth system models suffer from various structural and parametric errors in their representation of nonlinear, multi-scale processes, leading to uncertainties in their long-term projections. The effects of many of these errors (particularly those due to fast physics) can be quantified in short-term simulations, e.g., as differences between the predicted and observed states (analysis increments). With the increase in the availability of high-quality observations and simulations, learning nudging from these increments to correct model errors has become an active research area. However, most studies focus on using neural networks, which while powerful, are hard to interpret, are data-hungry, and poorly generalize out-of-distribution. Here, we show the capabilities of Model Error Discovery with Interpretability and Data Assimilation (MEDIDA), a general, data-efficient framework that uses sparsity-promoting equation-discovery techniques to learn model errors from analysis increments. Using two-layer quasi-geostrophic turbulence as the test case, MEDIDA is shown to successfully discover various linear and nonlinear structural/parametric errors when full observations are available. Discovery from spatially sparse observations is found to require highly accurate interpolation schemes. While NNs have shown success as interpolators in recent studies, here, they are found inadequate due to their inability to accurately represent small scales, a phenomenon known as spectral bias. We show that a general remedy, adding a random Fourier feature layer to the NN, resolves this issue enabling MEDIDA to successfully discover model errors from sparse observations. These promising results suggest that with further development, MEDIDA could be scaled up to models of the Earth system and real observations.
翻译:地球系统模型在描述非线性多尺度过程时存在各种结构和参数误差,导致其长期预测存在不确定性。许多误差(尤其是快速物理过程引起的误差)的影响可通过短期模拟量化,例如预测状态与观测状态之间的差异(分析增量)。随着高质量观测和模拟数据的增多,从这些增量中学习修正模型误差的“松弛逼近”方法已成为活跃研究领域。然而,现有研究多采用神经网络,虽功能强大但存在解释性差、数据需求大、域外泛化能力弱等问题。本文展示了兼具可解释性与数据同化的模型误差发现框架(MEDIDA)的能力。该通用且数据高效的框架利用稀疏性驱动的方程发现技术,从分析增量中学习模型误差。以两层准地转湍流为测试案例,MEDIDA在完全观测条件下成功发现了线性和非线性结构/参数误差。稀疏空间观测的误差发现需要高精度插值方案。尽管近期研究表明神经网络可作为有效的插值器,但本文发现其因无法准确表征小尺度现象(即谱偏差)而存在局限。我们证明,通过向神经网络的输入添加随机傅里叶特征层这一通用改进方法,可解决该问题,使MEDIDA能从稀疏观测中成功发现模型误差。这些突破性结果表明,通过进一步开发,MEDIDA有望扩展至地球系统模型及真实观测数据。