Deep learning methods achieve remarkable predictive performance in modeling complex, large-scale data. However, assessing the quality of derived models has become increasingly challenging, as more classical statistical assumptions may no longer apply. These difficulties are particularly pronounced for spatio-temporal data, which exhibit dependencies across both space and time and are often characterized by nonlinear dynamics, time variance, and missing observations, hence calling for new accuracy assessment methodologies. This paper introduces a residual correlation analysis framework for assessing the optimality of spatio-temporal relational-enabled neural predictive models, notably in settings with incomplete and heterogeneous data. By leveraging the principle that residual correlation indicates information not captured by the model, enabling the identification and localization of regions in space and time where predictive performance can be improved. A strength of the proposed approach is that it operates under minimal assumptions, allowing also for robust evaluation of deep learning models applied to multivariate time series, even in the presence of missing and heterogeneous data. In detail, the methodology constructs tailored spatio-temporal graphs to encode sparse spatial and temporal dependencies and employs asymptotically distribution-free summary statistics to detect time intervals and spatial regions where the model underperforms. The effectiveness of what proposed is demonstrated through experiments on both synthetic and real-world datasets using state-of-the-art predictive models.
翻译:深度学习方法在建模复杂大规模数据时展现出卓越的预测性能。然而,随着经典统计假设可能不再适用,评估所得模型的质量变得日益困难。这些挑战在时空数据中尤为突出,此类数据同时呈现空间与时间维度的依赖性,通常具有非线性动态、时变特性及缺失观测等特征,因而需要新的精度评估方法。本文提出一种残差相关性分析框架,用于评估具备时空关系建模能力的神经预测模型的最优性,特别是在数据不完整且异构的场景下。该框架基于"残差相关性反映模型未捕获信息"的原理,能够识别并定位预测性能可提升的时空区域。所提方法的优势在于其仅需最小假设条件,即使面对缺失与异构数据,也能对应用于多元时间序列的深度学习模型进行稳健评估。具体而言,该方法通过构建定制化的时空图来编码稀疏的时空依赖关系,并采用渐近无分布的汇总统计量来检测模型表现欠佳的时间区间与空间区域。通过使用前沿预测模型在合成数据集和真实数据集上的实验,验证了所提方法的有效性。