Ground-Level Near Real-Time Modeling for PM2.5 Pollution Prediction

Air pollution is a worldwide public health threat that can cause or exacerbate many illnesses, including respiratory disease, cardiovascular disease, and some cancers. However, epidemiological studies and public health decision-making are stymied by the inability to assess pollution exposure impacts in near real time. To address this, developing accurate digital twins of environmental pollutants will enable timely data-driven analytics - a crucial step in modernizing health policy and decision-making. Although other models predict and analyze fine particulate matter exposure, they often rely on modeled input data sources and data streams that are not regularly updated. Another challenge stems from current models relying on predefined grids. In contrast, our deep-learning approach interpolates surface level PM2.5 concentrations between sparsely distributed US EPA monitoring stations in a grid-free manner. By incorporating additional, readily available datasets - including topographic, meteorological, and land-use data - we improve its ability to predict pollutant concentrations with high spatial and temporal resolution. This enables model querying at any spatial location for rapid predictions without computing over the entire grid. To ensure robustness, we randomize spatial sampling during training to enable our model to perform well in both dense and sparse monitored regions. This model is well suited for near real-time deployment because its lightweight architecture allows for fast updates in response to streaming data. Moreover, model flexibility and scalability allow it to be adapted to various geographical contexts and scales, making it a practical tool for delivering accurate and timely air quality assessments. Its capacity to rapidly evaluate multiple scenarios can be especially valuable for decision-making during public health crises.

翻译：空气污染是全球性的公共卫生威胁，可引发或加剧包括呼吸系统疾病、心血管疾病及部分癌症在内的多种疾病。然而，由于无法近实时评估污染暴露影响，流行病学研究和公共卫生决策受到制约。为解决该问题，构建环境污染物精准数字孪生将实现及时的数据驱动分析——这是推动健康政策与决策现代化的关键步骤。现有模型虽能预测和分析细颗粒物暴露水平，但多依赖未及时更新的模拟输入数据源与数据流，且受限于预设网格框架。相比之下，我们的深度学习方法以无网格方式插值美国环保署稀疏监测站点间的地表PM2.5浓度。通过整合地形、气象、土地利用等额外易获取数据集，模型提升了高时空分辨率污染物浓度预测能力，可在不遍历整个网格的情况下对任意空间位置进行快速预测。为确保鲁棒性，我们在训练过程中实施空间采样随机化，使模型在监测密集区与稀疏区均表现优异。该模型轻量化架构支持流数据快速更新，非常适合近实时部署应用。此外，模型的灵活性与可扩展性使其能适配不同地理场景与尺度，成为提供及时精准空气质量评估的实用工具。其快速评估多场景的能力在公共卫生危机决策中尤具价值。