In this paper, we introduce PredBench, a benchmark tailored for the holistic evaluation of spatio-temporal prediction networks. Despite significant progress in this field, there remains a lack of a standardized framework for a detailed and comparative analysis of various prediction network architectures. PredBench addresses this gap by conducting large-scale experiments, upholding standardized and appropriate experimental settings, and implementing multi-dimensional evaluations. This benchmark integrates 12 widely adopted methods with 15 diverse datasets across multiple application domains, offering extensive evaluation of contemporary spatio-temporal prediction networks. Through meticulous calibration of prediction settings across various applications, PredBench ensures evaluations relevant to their intended use and enables fair comparisons. Moreover, its multi-dimensional evaluation framework broadens the analysis with a comprehensive set of metrics, providing deep insights into the capabilities of models. The findings from our research offer strategic directions for future developments in the field. Our codebase is available at https://github.com/OpenEarthLab/PredBench.
翻译:本文提出PredBench,这是一个专为时空预测网络整体评估而设计的基准测试平台。尽管该领域已取得显著进展,但目前仍缺乏对各类预测网络架构进行详细对比分析的标准框架。PredBench通过开展大规模实验、保持标准化且恰当的实验设置、实施多维度评估,有效填补了这一空白。该基准整合了12种广泛采用的方法与15个跨多应用领域的多样化数据集,为当代时空预测网络提供了全面评估。通过对不同应用场景中预测设置的精细校准,PredBench确保评估结果与实际应用场景相关联,并实现公平比较。此外,其多维评估框架通过综合指标集拓宽了分析维度,为模型能力提供了深入洞察。本研究结果为该领域的未来发展提供了战略方向。我们的代码库公开于https://github.com/OpenEarthLab/PredBench。