Climate change may be classified as the most important environmental problem that the Earth is currently facing, and affects all living species on Earth. Given that air-quality monitoring stations are typically ground-based their abilities to detect pollutant distributions are often restricted to wide areas. Satellites however have the potential for studying the atmosphere at large; the European Space Agency (ESA) Copernicus project satellite, "Sentinel-5P" is a newly launched satellite capable of measuring a variety of pollutant information with publicly available data outputs. This paper seeks to create a multi-modal machine learning model for predicting air-quality metrics where monitoring stations do not exist. The inputs of this model will include a fusion of ground measurements and satellite data with the goal of highlighting pollutant distribution and motivating change in societal and industrial behaviors. A new dataset of European pollution monitoring station measurements is created with features including $\textit{altitude, population, etc.}$ from the ESA Copernicus project. This dataset is used to train a multi-modal ML model, Air Quality Network (AQNet) capable of fusing these various types of data sources to output predictions of various pollutants. These predictions are then aggregated to create an "air-quality index" that could be used to compare air quality over different regions. Three pollutants, NO$_2$, O$_3$, and PM$_{10}$, are predicted successfully by AQNet and the network was found to be useful compared to a model only using satellite imagery. It was also found that the addition of supporting data improves predictions. When testing the developed AQNet on out-of-sample data of the UK and Ireland, we obtain satisfactory estimates though on average pollution metrics were roughly overestimated by around 20\%.
翻译:气候变化可被视为地球当前面临的最重要的环境问题,并影响地球上所有生物物种。由于空气质量监测站通常基于地面部署,其检测污染物分布的能力往往局限于广阔区域。然而,卫星具有研究大气层的巨大潜力;欧洲空间局哥白尼项目卫星"Sentinela-5P"是一颗新发射的卫星,能够测量多种污染物信息,并提供公开可用的数据输出。本文旨在构建一种多模态机器学习模型,以预测不存在监测站区域的空气质量指标。该模型的输入将包括地面测量数据与卫星数据的融合,目标在于突出污染物分布,并推动社会与工业行为的改变。我们创建了一个新的欧洲污染监测站测量数据集,其包含来自欧洲空间局哥白尼项目的特征,如$\textit{海拔、人口等}$。该数据集用于训练一种多模态机器学习模型——空气质量网络(AQNet),该模型能够融合这些不同类型的数据源,输出对各种污染物的预测。这些预测随后被汇总,形成可用于比较不同区域空气质量的"空气质量指数"。AQNet成功预测了NO$_2$、O$_3$和PM$_{10}$三种污染物,并且与仅使用卫星影像的模型相比,该网络被证明更具实用性。研究还发现,添加辅助数据可改善预测效果。在将所开发的AQNet应用于英国和爱尔兰的样本外数据进行测试时,我们获得了令人满意的估算结果,尽管污染指标平均被高估了约20\%。