Machine learning (ML) has recently shown significant promise in modelling atmospheric systems, such as the weather. Many of these ML models are autoregressive, and error accumulation in their forecasts is a key problem. However, there is no clear definition of what `error accumulation' actually entails. In this paper, we propose a definition and an associated metric to measure it. Our definition distinguishes between errors which are due to model deficiencies, which we may hope to fix, and those due to the intrinsic properties of atmospheric systems (chaos, unobserved variables), which are not fixable. We illustrate the usefulness of this definition by proposing a simple regularization loss penalty inspired by it. This approach shows performance improvements (according to RMSE and spread/skill) in a selection of atmospheric systems, including the real-world weather prediction task.
翻译:机器学习(ML)近期在模拟大气系统(如天气)方面展现出显著潜力。这些ML模型多为自回归模型,其预测中的误差累积是一个关键问题。然而,对于“误差累积”的实际内涵尚无明确定义。本文提出了一种定义及相应的度量方法。该定义区分了源于模型缺陷的可修正误差与源于大气系统内在特性(混沌性、未观测变量)的不可修正误差。我们通过受此定义启发提出一种简单的正则化损失惩罚项,以阐明该定义的有效性。该方法在包括真实世界天气预报任务在内的多种大气系统中,均显示出性能(根据均方根误差与离散度/技能评分)的提升。