Spatiotemporal predictive learning, which predicts future frames through historical prior knowledge with the aid of deep learning, is widely used in many fields. Previous work essentially improves the model performance by widening or deepening the network, but it also brings surging memory overhead, which seriously hinders the development and application of this technology. In order to improve the performance without increasing memory consumption, we focus on scale, which is another dimension to improve model performance but with low memory requirement. The effectiveness has been widely demonstrated in many CNN-based tasks such as image classification and semantic segmentation, but it has not been fully explored in recent RNN models. In this paper, learning from the benefit of multi-scale, we propose a general framework named Multi-Scale RNN (MS-RNN) to boost recent RNN models for spatiotemporal predictive learning. By integrating different scales, we enhance the existing models with both improved performance and greatly reduced overhead. We verify the MS-RNN framework by exhaustive experiments with eight popular RNN models (ConvLSTM, TrajGRU, PredRNN, PredRNN++, MIM, MotionRNN, PredRNN-V2, and PrecipLSTM) on four different datasets (Moving MNIST, TaxiBJ, KTH, and Germany). The results show the efficiency that the RNN models incorporating our framework have much lower memory cost but better performance than before. Our code is released at \url{https://github.com/mazhf/MS-RNN}.
翻译:时空预测学习借助深度学习与历史先验知识预测未来帧,被广泛应用于多个领域。以往研究主要通过拓宽或加深网络来提升模型性能,但这会带来内存开销激增的问题,严重阻碍了该技术的发展与应用。为在不增加内存消耗的前提下提升性能,我们聚焦于尺度这一维度——它既能提升模型性能,又具有低内存需求的特点。其有效性已在图像分类、语义分割等众多基于CNN的任务中得到验证,但在近期RNN模型中尚未得到充分探索。本文从多尺度的优势中汲取灵感,提出了名为多尺度RNN(MS-RNN)的通用框架,用于增强现有RNN模型在时空预测学习中的表现。通过整合不同尺度,我们在提升模型性能的同时,大幅降低了计算开销。我们在四个不同数据集(Moving MNIST、TaxiBJ、KTH和Germany)上,对八种主流RNN模型(ConvLSTM、TrajGRU、PredRNN、PredRNN++、MIM、MotionRNN、PredRNN-V2和PrecipLSTM)进行了详尽的实验验证。结果表明,融合本框架的RNN模型在保持更低内存成本的同时,性能显著优于原有模型。我们的代码已开源至:\url{https://github.com/mazhf/MS-RNN}。