M$^2$FedAQI: Multimodal Federated Learning for Air Quality Prediction on Heterogeneous Edge Devices

Accurate air quality prediction is essential for public health, environmental monitoring, and industrial safety. However, most existing approaches rely on centralized learning paradigms, which introduce challenges related to scalability, privacy preservation, and communication overhead in distributed Internet of Things (IoT) environments. Moreover, current federated learning (FL) based solutions predominantly utilize unimodal data, limiting their capability to capture complex environmental patterns. To address these limitations, we propose M$^2$FedAQI, a lightweight multimodal federated framework for decentralized Air Quality Index (AQI) prediction across heterogeneous edge devices. The proposed framework integrates visual and tabular modalities through a feature modulation based fusion mechanism that enables efficient cross-modal interaction while maintaining low computational overhead. M$^2$FedAQI is evaluated on two benchmark datasets, PM25Vision and TRAQID, for both classification and regression tasks under centralized and federated settings. Experimental results demonstrate that M$^2$FedAQI consistently outperforms existing approaches, achieving improvements of up to 11.0\% in Accuracy, 3.53\% in AUC, 12.2\% in F1-score, and 18.0\% in $R^2$, while reducing MAE and RMSE by up to 25.4\% and 20.4\%, respectively, compared with the strongest baselines. Furthermore, deployment on heterogeneous edge devices demonstrates efficient resource utilization in terms of communication overhead, memory footprint, and computational cost. To enhance communication security, TLS-based authentication is incorporated to ensure secure client participation and protect the FL communication channel from unauthorized third-party access without modifying the underlying FL protocol.

翻译：精准的空气质量预测对公共卫生、环境监测及工业安全至关重要。然而，现有方法大多基于集中式学习范式，在分布式物联网环境中面临可扩展性、隐私保护及通信开销方面的挑战。此外，当前基于联邦学习的解决方案主要使用单模态数据，限制了其捕捉复杂环境模式的能力。为应对这些局限，我们提出M$^2$FedAQI——一种面向异构边缘设备上分散式空气质量指数预测的轻量级多模态联邦框架。该框架通过基于特征调制的融合机制集成视觉与表格模态，在保持低计算开销的同时实现高效的跨模态交互。我们在PM25Vision与TRAQID两个基准数据集上，针对集中式及联邦场景下的分类与回归任务进行了评估。实验结果表明，M$^2$FedAQI在各任务上持续优于现有方法：与最强基线相比，在准确率上提升高达11.0%，AUC提升3.53%，F1分数提升12.2%，$R^2$提升18.0%，同时将MAE与RMSE分别降低25.4%和20.4%。此外，在异构边缘设备上的部署验证了其在通信开销、内存占用及计算成本方面的高效资源利用。为增强通信安全性，我们集成基于TLS的身份验证机制，在不修改底层联邦学习协议的前提下，保障客户端参与安全并防止未授权第三方接入联邦学习通信信道。