Spatio-temporal predictive learning is a learning paradigm that enables models to learn spatial and temporal patterns by predicting future frames from given past frames in an unsupervised manner. Despite remarkable progress in recent years, a lack of systematic understanding persists due to the diverse settings, complex implementation, and difficult reproducibility. Without standardization, comparisons can be unfair and insights inconclusive. To address this dilemma, we propose OpenSTL, a comprehensive benchmark for spatio-temporal predictive learning that categorizes prevalent approaches into recurrent-based and recurrent-free models. OpenSTL provides a modular and extensible framework implementing various state-of-the-art methods. We conduct standard evaluations on datasets across various domains, including synthetic moving object trajectory, human motion, driving scenes, traffic flow and weather forecasting. Based on our observations, we provide a detailed analysis of how model architecture and dataset properties affect spatio-temporal predictive learning performance. Surprisingly, we find that recurrent-free models achieve a good balance between efficiency and performance than recurrent models. Thus, we further extend the common MetaFormers to boost recurrent-free spatial-temporal predictive learning. We open-source the code and models at https://github.com/chengtan9907/OpenSTL.
翻译:时空预测学习是一种学习范式,通过无监督方式从给定的过去帧预测未来帧,使模型能够学习空间和时间模式。尽管近年来取得了显著进展,但由于设置多样、实现复杂且难以复现,仍缺乏系统性的理解。缺乏标准化会导致比较不公平,结论也难以确定。为解决这一困境,我们提出了OpenSTL,一个用于时空预测学习的综合基准,将现有主流方法分为基于循环的和无循环的模型。OpenSTL提供了一个模块化且可扩展的框架,实现了多种最先进的方法。我们在多领域数据集上进行了标准评估,包括合成运动物体轨迹、人体运动、驾驶场景、交通流和天气预报。基于观察结果,我们详细分析了模型架构与数据集属性如何影响时空预测学习性能。令人惊讶的是,我们发现无循环模型在效率和性能之间达到了比循环模型更好的平衡。因此,我们进一步扩展了常见的MetaFormer网络,以提升无循环时空预测学习。代码和模型已在https://github.com/chengtan9907/OpenSTL开源。