Spatiotemporal predictive learning, which predicts future frames through historical prior knowledge with the aid of deep learning, is widely used in many fields. Previous work essentially improves the model performance by widening or deepening the network, but it also brings surging memory overhead, which seriously hinders the development and application of this technology. In order to improve the performance without increasing memory consumption, we focus on scale, which is another dimension to improve model performance but with low memory requirement. The effectiveness has been widely demonstrated in many CNN-based tasks such as image classification and semantic segmentation, but it has not been fully explored in recent RNN models. In this paper, learning from the benefit of multi-scale, we propose a general framework named Multi-Scale RNN (MS-RNN) to boost recent RNN models for spatiotemporal predictive learning. We verify the MS-RNN framework by thorough theoretical analyses and exhaustive experiments, where the theory focuses on memory reduction and performance improvement while the experiments employ eight RNN models (ConvLSTM, TrajGRU, PredRNN, PredRNN++, MIM, MotionRNN, PredRNN-V2, and PrecipLSTM) and four datasets (Moving MNIST, TaxiBJ, KTH, and Germany). The results show the efficiency that RNN models incorporating our framework have much lower memory cost but better performance than before. Our code is released at \url{https://github.com/mazhf/MS-RNN}.
翻译:时空预测学习通过深度学习利用历史先验知识预测未来帧,已在众多领域得到广泛应用。现有工作主要通过拓宽或加深网络来提升模型性能,但这也带来了激增的内存开销,严重阻碍了该技术的发展与应用。为在不增加内存消耗的前提下提升性能,本文聚焦于“尺度”这一维度——该维度既能以较低内存需求提升模型性能,其有效性已在图像分类、语义分割等众多基于CNN的任务中得到广泛验证,但在近期RNN模型中尚未被充分探索。本文借鉴多尺度的优势,提出一个通用框架——多尺度RNN(MS-RNN),以增强现有RNN模型在时空预测学习中的表现。通过理论分析与充分实验验证MS-RNN框架:理论部分着重探讨内存缩减与性能提升机制,实验部分采用八种RNN模型(ConvLSTM、TrajGRU、PredRNN、PredRNN++、MIM、MotionRNN、PredRNN-V2和PrecipLSTM)及四个数据集(Moving MNIST、TaxiBJ、KTH和Germany)。结果表明,融入该框架的RNN模型在显著降低内存开销的同时实现了更优性能。我们的代码已开源至 \url{https://github.com/mazhf/MS-RNN}。