Mini-Batch Learning Strategies for modeling long term temporal dependencies: A study in environmental applications

Shaoming Xu,Ankush Khandelwal,Xiang Li,Xiaowei Jia,Licheng Liu,Jared Willard,Rahul Ghosh,Kelly Cutler,Michael Steinbach,Christopher Duffy,John Nieber,Vipin Kumar

from arxiv, 1. Add experiments results on LSTM and Transformer. 2. Update Time efficiency table (table 4). 3. Share codes and data

In many environmental applications, recurrent neural networks (RNNs) are often used to model physical variables with long temporal dependencies. However, due to mini-batch training, temporal relationships between training segments within the batch (intra-batch) as well as between batches (inter-batch) are not considered, which can lead to limited performance. Stateful RNNs aim to address this issue by passing hidden states between batches. Since Stateful RNNs ignore intra-batch temporal dependency, there exists a trade-off between training stability and capturing temporal dependency. In this paper, we provide a quantitative comparison of different Stateful RNN modeling strategies, and propose two strategies to enforce both intra- and inter-batch temporal dependency. First, we extend Stateful RNNs by defining a batch as a temporally ordered set of training segments, which enables intra-batch sharing of temporal information. While this approach significantly improves the performance, it leads to much larger training times due to highly sequential training. To address this issue, we further propose a new strategy which augments a training segment with an initial value of the target variable from the timestep right before the starting of the training segment. In other words, we provide an initial value of the target variable as additional input so that the network can focus on learning changes relative to that initial value. By using this strategy, samples can be passed in any order (mini-batch training) which significantly reduces the training time while maintaining the performance. In demonstrating our approach in hydrological modeling, we observe that the most significant gains in predictive accuracy occur when these methods are applied to state variables whose values change more slowly, such as soil water and snowpack, rather than continuously moving flux variables such as streamflow.

翻译：在许多环境应用中，循环神经网络（RNN）常被用于建模具有长时间依赖关系的物理变量。然而，由于小批量训练机制，训练批次内部（批次内）以及批次之间（批次间）的时序关联未被考虑，这可能导致性能受限。状态循环神经网络通过批次间传递隐藏状态来解决该问题。由于状态RNN忽略了批次内的时间依赖性，其在训练稳定性与捕捉时序依赖之间存在权衡。本文对不同状态RNN建模策略进行了定量比较，并提出两种强制实现批次内与批次间时间依赖的策略。首先，我们通过将批次定义为按时间顺序排列的训练片段集合来扩展状态RNN，从而实现批次内时序信息的共享。尽管该方法显著提升了性能，但因其高度顺序化的训练过程导致训练时间大幅增加。为解决该问题，我们进一步提出一种新策略：在训练片段起始时间步之前，将目标变量的初始值作为附加输入进行增强。换言之，我们将目标变量的初始值作为额外输入，使网络能够专注于学习相对于该初始值的变化。采用此策略后，样本可按任意顺序传递（小批量训练），在保持性能的同时显著缩短训练时间。在将本方法应用于水文建模时，我们发现该方法对状态变量（如土壤水分和积雪层）的预测精度提升最为显著，这些变量的变化速度较慢，而非连续变化的通量变量（如径流量）。