We present a knowledge-guided machine learning (KGML) framework for modeling multi-scale processes, and study its performance in the context of streamflow forecasting in hydrology. Specifically, we propose a novel hierarchical recurrent neural architecture that factorizes the system dynamics at multiple temporal scales and captures their interactions. This framework consists of an inverse and a forward model. The inverse model is used to empirically resolve the system's temporal modes from data (physical model simulations, observed data, or a combination of them from the past), and these states are then used in the forward model to predict streamflow. In a hydrological system, these modes can represent different processes, evolving at different temporal scales (e.g., slow: groundwater recharge and baseflow vs. fast: surface runoff due to extreme rainfall). A key advantage of our framework is that once trained, it can incorporate new observations into the model's context (internal state) without expensive optimization approaches (e.g., EnKF) that are traditionally used in physical sciences for data assimilation. Experiments with several river catchments from the NWS NCRFC region show the efficacy of this ML-based data assimilation framework compared to standard baselines, especially for basins that have a long history of observations. Even for basins that have a shorter observation history, we present two orthogonal strategies of training our FHNN framework: (a) using simulation data from imperfect simulations and (b) using observation data from multiple basins to build a global model. We show that both of these strategies (that can be used individually or together) are highly effective in mitigating the lack of training data. The improvement in forecast accuracy is particularly noteworthy for basins where local models perform poorly because of data sparsity.
翻译:本文提出了一种用于建模多尺度过程的知识引导机器学习(KGML)框架,并以水文学中的径流预测为背景研究了其性能。具体而言,我们提出了一种新颖的分层循环神经网络架构,该架构在多个时间尺度上分解系统动力学并捕捉其相互作用。该框架包含一个逆向模型和一个正向模型。逆向模型用于从数据(物理模型模拟、观测数据或两者过去数据的组合)中经验性地解析系统的时间模态,这些状态随后在正向模型中用于预测径流。在水文系统中,这些模态可以代表以不同时间尺度演化的不同过程(例如,慢速:地下水补给和基流 vs. 快速:极端降雨导致的地表径流)。我们框架的一个关键优势在于,一旦训练完成,它可以将新的观测数据纳入模型的上下文(内部状态),而无需使用物理科学中传统用于数据同化的昂贵优化方法(如集合卡尔曼滤波EnKF)。在NWS NCRFC地区多个河流集水区的实验表明,与标准基线方法相比,这种基于机器学习的数据同化框架具有显著效能,尤其对于具有长期观测历史的流域。即使对于观测历史较短的流域,我们也提出了两种正交策略来训练我们的FHNN框架:(a)使用不完美模拟的仿真数据,以及(b)使用来自多个流域的观测数据构建全局模型。我们证明这两种策略(可单独或组合使用)在缓解训练数据不足方面非常有效。对于因数据稀疏而导致本地模型性能较差的流域,预测准确性的提升尤为显著。