Data fusion models are widely used in air quality monitoring to integrate in situ and large-scale gridded products, offering spatially complete and temporally detailed estimates. However, traditional Gaussian-based models often underestimate extreme pollution values, leading to biased risk assessments. To address this, we present a Bayesian hierarchical data fusion framework rooted in extreme value theory, using the Dirac-delta generalised Pareto distribution to jointly account for threshold and non-threshold exceedances while preserving the timing of exceedance and non-exceedance episodes. Our model is used to describe and predict censored threshold exceedances of PM2.5 pollution in the Greater London region by using CAMS atmospheric composition reanalysis, and in situ observation stations from the automatic urban and rural network (AURN) run by the UK government. Key features of our approach include combining data with varying spatio-temporal resolutions and fully accounting for parameter uncertainties. Results show that our model outperforms Gaussian-based alternatives and standalone reanalysis data in predicting threshold exceedances at the majority of observation sites and can even result in improved spatial patterns of PM2.5 pollution than those discernible from the background data. Moreover, our approach captures greater variability and spatial patterns, such as higher PM2.5 concentrations near coastal areas, which are not evident in the reanalysis data alone.
翻译:数据融合模型广泛应用于空气质量监测中,用于整合原位观测与大尺度网格化产品,提供空间完整且时间精细的估计。然而,传统基于高斯过程的模型往往低估极端污染值,导致风险评估出现偏差。为此,我们提出一个基于极值理论的贝叶斯层次数据融合框架,利用狄拉克-德尔塔广义帕累托分布联合处理阈值超标与未超标事件,同时保留超标与非超标时段的时间连续性。该模型被用于描述和预测大伦敦地区PM2.5污染中截断阈值超标事件,采用CAMS大气成分再分析数据以及英国政府自动城市与农村监测网络(AURN)的原位观测站数据。本方法的关键特点包括融合不同时空分辨率的数据,并充分量化参数不确定性。结果表明,在多数观测站点上,该模型在预测阈值超标方面优于基于高斯过程的替代模型及单独再分析数据,甚至能揭示出比背景数据更清晰的PM2.5污染空间格局。此外,我们的方法捕捉到了更大的变异性和空间模式,例如沿海区域出现的较高PM2.5浓度,而这些在单独再分析数据中并不明显。