Spatio-temporal predictive learning is a learning paradigm that enables models to learn spatial and temporal patterns by predicting future frames from given past frames in an unsupervised manner. Despite remarkable progress in recent years, a lack of systematic understanding persists due to the diverse settings, complex implementation, and difficult reproducibility. Without standardization, comparisons can be unfair and insights inconclusive. To address this dilemma, we propose OpenSTL, a comprehensive benchmark for spatio-temporal predictive learning that categorizes prevalent approaches into recurrent-based and recurrent-free models. OpenSTL provides a modular and extensible framework implementing various state-of-the-art methods. We conduct standard evaluations on datasets across various domains, including synthetic moving object trajectory, human motion, driving scenes, traffic flow and weather forecasting. Based on our observations, we provide a detailed analysis of how model architecture and dataset properties affect spatio-temporal predictive learning performance. Surprisingly, we find that recurrent-free models achieve a good balance between efficiency and performance than recurrent models. Thus, we further extend the common MetaFormers to boost recurrent-free spatial-temporal predictive learning. We open-source the code and models at https://github.com/chengtan9907/OpenSTL.
翻译:时空预测学习是一种学习范式,使模型能够通过从给定的过去帧无监督地预测未来帧来学习空间和时间模式。尽管近年来取得了显著进展,但由于设置多样、实现复杂且难以复现,仍缺乏系统性理解。缺乏标准化会导致比较不公平且结论不确定。为解决这一困境,我们提出了OpenSTL,一个用于时空预测学习的综合基准,将主流方法分为基于循环和免循环模型。OpenSTL提供了一个模块化且可扩展的框架,实现了多种最先进方法。我们在多个领域的标准数据集上进行了评估,包括合成移动物体轨迹、人体运动、驾驶场景、交通流量和天气预报。基于观察,我们详细分析了模型架构和数据集属性如何影响时空预测学习性能。令人惊讶的是,我们发现免循环模型在效率和性能之间比循环模型取得了更好的平衡。因此,我们进一步扩展了常见的MetaFormers以增强免循环时空预测学习。我们在https://github.com/chengtan9907/OpenSTL开源了代码和模型。