This study presents a comparative analysis of three predictive models with an increasing degree of flexibility: hidden dynamic geostatistical models (HDGM), generalised additive mixed models (GAMM), and the random forest spatiotemporal kriging models (RFSTK). These models are evaluated for their effectiveness in predicting PM$_{2.5}$ concentrations in Lombardy (North Italy) from 2016 to 2020. Despite differing methodologies, all models demonstrate proficient capture of spatiotemporal patterns within air pollution data with similar out-of-sample performance. Furthermore, the study delves into station-specific analyses, revealing variable model performance contingent on localised conditions. Model interpretation, facilitated by parametric coefficient analysis and partial dependence plots, unveils consistent associations between predictor variables and PM$_{2.5}$ concentrations. Despite nuanced variations in modelling spatiotemporal correlations, all models effectively accounted for the underlying dependence. In summary, this study underscores the efficacy of conventional techniques in modelling correlated spatiotemporal data, concurrently highlighting the complementary potential of Machine Learning and classical statistical approaches.
翻译:本研究对三种灵活度递增的预测模型进行了比较分析:隐动态地统计模型(HDGM)、广义可加混合模型(GAMM)以及随机森林时空克里金模型(RFSTK)。这些模型在预测2016年至2020年意大利北部伦巴第大区PM$_{2.5}$浓度的有效性上得到评估。尽管方法各异,所有模型均能准确捕捉空气污染数据中的时空模式,且样本外性能相似。此外,研究深入分析了各监测站点,揭示了模型性能随局部条件变化的差异性。通过参数系数分析和偏依赖图进行的模型解释,揭示了预测变量与PM$_{2.5}$浓度之间的一致性关联。尽管在时空相关性建模上存在细微差异,所有模型均有效解释了潜在的依赖关系。总之,本研究强调了传统技术在建模相关时空数据中的有效性,同时揭示了机器学习与经典统计方法的互补潜力。