In this paper, we introduce PredBench, a benchmark tailored for the holistic evaluation of spatio-temporal prediction networks. Despite significant progress in this field, there remains a lack of a standardized framework for a detailed and comparative analysis of various prediction network architectures. PredBench addresses this gap by conducting large-scale experiments, upholding standardized and appropriate experimental settings, and implementing multi-dimensional evaluations. This benchmark integrates 12 widely adopted methods with 15 diverse datasets across multiple application domains, offering extensive evaluation of contemporary spatio-temporal prediction networks. Through meticulous calibration of prediction settings across various applications, PredBench ensures evaluations relevant to their intended use and enables fair comparisons. Moreover, its multi-dimensional evaluation framework broadens the analysis with a comprehensive set of metrics, providing deep insights into the capabilities of models. The findings from our research offer strategic directions for future developments in the field. Our codebase is available at https://github.com/WZDTHU/PredBench.
翻译:本文提出了PredBench,一个专为时空预测网络进行全面评估而设计的基准测试。尽管该领域已取得显著进展,但目前仍缺乏一个标准化的框架来对各种预测网络架构进行详细和比较分析。PredBench通过开展大规模实验、坚持标准化且合适的实验设置、并实施多维评估,来弥补这一不足。该基准测试整合了12种广泛采用的方法和来自多个应用领域的15个多样化数据集,为当代时空预测网络提供了广泛的评估。通过对不同应用场景中预测设置的细致校准,PredBench确保了评估与其预期用途的相关性,并实现了公平比较。此外,其多维评估框架通过一套全面的指标拓宽了分析范围,为模型能力提供了深刻的洞见。我们的研究结果为该领域的未来发展提供了战略方向。我们的代码库可在 https://github.com/WZDTHU/PredBench 获取。