Training recurrent neural networks (RNNs) with standard backpropagation through time (BPTT) can be challenging, especially in the presence of long input sequences. A practical alternative to reduce computational and memory overhead is to perform BPTT repeatedly over shorter segments of the training data set, corresponding to truncated BPTT. In this paper, we examine the training of RNNs when using such a truncated learning approach for time series tasks. Specifically, we establish theoretical bounds on the accuracy and performance loss when optimizing over subsequences instead of the full data sequence. This reveals that the burn-in phase of the RNN is an important tuning knob in its training, with significant impact on the performance guarantees. We validate our theoretical results through experiments on standard benchmarks from the fields of system identification and time series forecasting. In all experiments, we observe a strong influence of the burn-in phase on the training process, and proper tuning can lead to a reduction of the prediction error on the training and test data of more than 60% in some cases.
翻译:使用标准时间反向传播算法训练循环神经网络具有挑战性,尤其是在处理长输入序列时。为降低计算和内存开销,一种实用的替代方案是在训练数据集的较短片段上重复执行BPTT,即截断BPTT。本文研究了在时间序列任务中采用此类截断学习方法的RNN训练过程。具体而言,我们建立了在子序列而非完整数据序列上进行优化时,其精度与性能损失的理论界限。这表明RNN的预热阶段是其训练过程中的重要调节参数,对性能保证具有显著影响。我们通过在系统辨识和时间序列预测领域的标准基准测试上进行实验,验证了理论结果。在所有实验中,我们观察到预热阶段对训练过程具有强烈影响,适当调整该参数在某些情况下可使训练数据和测试数据的预测误差降低超过60%。