Global air quality forecasting grapples with extreme spatial heterogeneity and the poor generalization of existing transductive models to unseen regions. To tackle this, we propose OmniAir, a semantic topology learning framework tailored for global station-level prediction. By encoding invariant physical environmental attributes into generalizable station identities and dynamically constructing adaptive sparse topologies, our approach effectively captures long-range non-Euclidean correlations and physical diffusion patterns across unevenly distributed global networks. We further curate WorldAir, a massive dataset covering over 7,800 stations worldwide. Extensive experiments show that OmniAir achieves state-of-the-art performance against 18 baselines, maintaining high efficiency and scalability with speeds nearly 10 times faster than existing models, while effectively bridging the monitoring gap in data-sparse regions.
翻译:全球空气质量预测面临着极端空间异质性以及现有直推式模型对未见区域泛化能力不足的挑战。为解决这一问题,我们提出了OmniAir——一个专为全球站点级预测设计的语义拓扑学习框架。通过将不变的物理环境属性编码为可泛化的站点标识,并动态构建自适应稀疏拓扑,我们的方法有效捕捉了全球不均匀分布站点网络间的长程非欧几里得相关性及物理扩散模式。我们进一步构建了WorldAir,一个覆盖全球超过7,800个站点的大规模数据集。大量实验表明,OmniAir在18个基线模型中取得了最先进的性能,同时保持了高效率和可扩展性,其运行速度比现有模型快近10倍,并能有效弥合数据稀疏区域的监测空白。