Spatiotemporal predictive learning, which predicts future frames through historical prior knowledge with the aid of deep learning, is widely used in many fields. Previous work essentially improves the model performance by widening or deepening the network, but it also brings surging memory overhead, which seriously hinders the development and application of this technology. In order to improve the performance without increasing memory consumption, we focus on scale, which is another dimension to improve model performance but with low memory requirement. The effectiveness has been widely demonstrated in many CNN-based tasks such as image classification and semantic segmentation, but it has not been fully explored in recent RNN models. In this paper, learning from the benefit of multi-scale, we propose a general framework named Multi-Scale RNN (MS-RNN) to boost recent RNN models for spatiotemporal predictive learning. We verify the MS-RNN framework by thorough theoretical analyses and exhaustive experiments, where the theory focuses on memory reduction and performance improvement while the experiments employ eight RNN models (ConvLSTM, TrajGRU, PredRNN, PredRNN++, MIM, MotionRNN, PredRNN-V2, and PrecipLSTM) and four datasets (Moving MNIST, TaxiBJ, KTH, and Germany). The results show the efficiency that RNN models incorporating our framework have much lower memory cost but better performance than before. Our code is released at \url{https://github.com/mazhf/MS-RNN}.
翻译:时空预测学习通过深度学习利用历史先验知识预测未来帧,已在多个领域得到广泛应用。现有工作主要通过拓宽或加深网络来提升模型性能,但这也会导致内存开销激增,严重阻碍了该技术的发展与应用。为在不增加内存消耗的前提下提升性能,我们聚焦于尺度这一维度——该维度既能提升模型性能,又具有较低的内存需求。尽管尺度有效性已在图像分类、语义分割等众多基于CNN的任务中得到广泛验证,但在近期RNN模型中尚未被充分探索。本文借鉴多尺度优势,提出名为多尺度循环神经网络(MS-RNN)的通用框架,用于增强现有RNN模型在时空预测学习中的表现。通过理论分析与实验验证双轨并行的方式对MS-RNN框架进行验证:理论部分聚焦于内存缩减与性能提升,实验部分采用八种RNN模型(ConvLSTM、TrajGRU、PredRNN、PredRNN++、MIM、MotionRNN、PredRNN-V2及PrecipLSTM)与四个数据集(Moving MNIST、TaxiBJ、KTH及Germany)。结果表明,融合该框架的RNN模型在显著降低内存成本的同时,性能优于原始模型。代码已开源至\url{https://github.com/mazhf/MS-RNN}。