Future frame prediction has been approached through two primary methods: autoregressive and non-autoregressive. Autoregressive methods rely on the Markov assumption and can achieve high accuracy in the early stages of prediction when errors are not yet accumulated. However, their performance tends to decline as the number of time steps increases. In contrast, non-autoregressive methods can achieve relatively high performance but lack correlation between predictions for each time step. In this paper, we propose an Implicit Stacked Autoregressive Model for Video Prediction (IAM4VP), which is an implicit video prediction model that applies a stacked autoregressive method. Like non-autoregressive methods, stacked autoregressive methods use the same observed frame to estimate all future frames. However, they use their own predictions as input, similar to autoregressive methods. As the number of time steps increases, predictions are sequentially stacked in the queue. To evaluate the effectiveness of IAM4VP, we conducted experiments on three common future frame prediction benchmark datasets and weather\&climate prediction benchmark datasets. The results demonstrate that our proposed model achieves state-of-the-art performance.
翻译:未来帧预测主要通过两种方法实现:自回归方法和非自回归方法。自回归方法依赖马尔可夫假设,在预测初期误差未累积时能够实现高精度,但性能往往随着时间步长增加而下降。相比之下,非自回归方法虽能实现较高性能,但各时间步预测之间缺乏关联性。本文提出一种用于视频预测的隐式堆叠自回归模型(IAM4VP),这是一种应用堆叠自回归方法的隐式视频预测模型。与非自回归方法类似,堆叠自回归方法使用相同的观测帧来估计所有未来帧,但会像自回归方法那样将自身预测结果作为输入。随着时间步长增加,预测结果会按顺序堆叠在队列中。为评估IAM4VP的有效性,我们在三个常见的未来帧预测基准数据集以及天气/气候预测基准数据集上进行了实验。结果表明,我们提出的模型达到了最先进的性能水平。