Interpretable PM2.5 Forecasting for Urban Air Quality: A Comparative Study of Operational Time-Series Models

Accurate short-term air-quality forecasting is essential for public health protection and urban management, yet many recent forecasting frameworks rely on complex, data-intensive, and computationally demanding models. This study investigates whether lightweight and interpretable forecasting approaches can provide competitive performance for hourly PM2.5 prediction in Beijing, China. Using multi-year pollutant and meteorological time-series data, we developed a leakage-aware forecasting workflow that combined chronological data partitioning, preprocessing, feature selection, and exogenous-driver modeling under the Perfect Prognosis setting. Three forecasting families were evaluated: SARIMAX, Facebook Prophet, and NeuralProphet. To assess practical deployment behavior, the models were tested under two adaptive regimes: weekly walk-forward refitting and frozen forecasting with online residual correction. Results showed clear differences in both predictive accuracy and computational efficiency. Under walk-forward refitting, Facebook Prophet achieved the strongest completed performance, with an MAE of $37.61$ and an RMSE of $50.10$, while also requiring substantially less execution time than NeuralProphet. In the frozen-model regime, online residual correction improved Facebook Prophet and SARIMAX, with corrected SARIMAX yielding the lowest overall error (MAE $32.50$; RMSE $46.85$). NeuralProphet remained less accurate and less stable across both regimes, and residual correction did not improve its forecasts. Notably, corrected Facebook Prophet reached nearly the same error as its walk-forward counterpart while reducing runtime from $15$ min $21.91$ sec to $46.60$ sec. These findings show that lightweight additive forecasting strategies can remain highly competitive for urban air-quality prediction, offering a practical balance between accuracy, interpretability, ...

翻译：准确的短期空气质量预测对公共卫生保护和城市管理至关重要，然而许多近期预测框架依赖于复杂、数据密集且计算要求高的模型。本研究探究轻量级且可解释的预测方法能否为中国北京市逐小时PM2.5预测提供有竞争力的性能。利用多年污染物和气象时间序列数据，我们开发了一种带有泄漏意识的预测工作流，该工作流在"完美预报"设定下结合了时间序列数据划分、预处理、特征选择及外生驱动建模。评估了三种预测模型系列：SARIMAX、Facebook Prophet和NeuralProphet。为评估实际部署表现，模型在两种自适应机制下进行了测试：每周向前滚动拟合与使用在线残差校正的冻结预测。结果显示，预测精度与计算效率均存在明显差异。在向前滚动拟合下，Facebook Prophet实现了最优的完整性能，MAE为37.61、RMSE为50.10，同时所需执行时间显著少于NeuralProphet。在冻结模型机制下，在线残差校正改善了Facebook Prophet和SARIMAX的性能，其中校正后的SARIMAX实现了最低总体误差（MAE 32.50；RMSE 46.85）。NeuralProphet在两种机制下均表现欠准确且稳定性较差，且残差校正未能改善其预测。值得注意的是，校正后的Facebook Prophet在误差表现上几乎与向前滚动拟合版本相当，同时将运行时间从15分21.91秒缩短至46.60秒。这些发现表明，轻量级加性预测策略在城市空气质量预测中仍能保持高度竞争力，在精度、可解释性...（原文截断）之间提供了实用平衡。