Fusion of Pervasive RF Data with Spatial Images via Vision Transformers for Enhanced Mapping in Smart Cities

from arxiv, Work supported by funding under the bilateral agreement between CNR (Italy) and HESC MESCS RA (Armenia) as part of the DeepRF project for the 2025-2026 biennium, and by the HESC MESCS RA grant No. 22rl-052 (DISTAL)

In this paper, we present a deep learning-based approach that integrates the DINOv2 architecture to improve building mapping by combining (possibly erroneous) maps from open-source platforms with pervasive radio frequency (RF) data collected from multiple wireless user equipments and base stations. Unlike prior methods, our approach leverages a vision transformer-based architecture to jointly process both RF and map modalities within a unified framework, effectively capturing spatial dependencies and structural priors for enhanced mapping accuracy. For the evaluation purposes, we employ a synthetic dataset co-produced by Huawei. To address the challenges associated with real-world data imperfections, we introduce controlled noise to its RF data so as to simulate real-world conditions. Additionally, we develop and train a model that leverages only aggregated path loss information to tackle the mapping problem. We measure the results according to three performance metrics: the Jaccard index (intersection over union, IoU), the Hausdorff distance, and the Chamfer distance. Our design achieves a macro IoU of 65.3%, significantly surpassing (i) the erroneous maps baseline, which yields 40.1%, (ii) an RF-only method from the literature, which yields 37.3%, and (iii) a non-AI fusion baseline that we designed which yields 42.2%. The comparative evaluation highlights the limitations of relying solely on RF data or on spatial data, as well as the effectiveness that AI can have on fusing data towards enhancing smart city mapping accuracy. We further validate our method on real-world data from the Oslo region, complementing the synthetic evaluation with a real deployment setting, where our best fusion model reaches 64.9% macro IoU. We additionally outline a strategy for deploying the model over larger areas by tiling the region with overlapping windows.

翻译：本文提出一种基于深度学习的融合方法，通过集成DINOv2架构，将开源平台生成的地图（可能包含误差）与从无线用户设备和基站采集的泛在射频数据相结合，以提升建筑地图构建精度。与现有方法不同，本方法采用基于视觉Transformer的架构，在统一框架内联合处理射频与地图两种模态数据，有效捕获空间依赖关系与结构先验信息，从而增强地图构建准确性。为评估性能，我们采用华为联合生成的合成数据集。针对真实数据的不完美特性，我们在射频数据中引入可控噪声以模拟实际场景。此外，我们开发并训练了一个仅利用聚合路径损耗信息的模型来解决地图构建问题。采用杰卡德指数（交并比）、豪斯多夫距离和倒角距离三项性能指标进行度量。本设计的宏平均交并比达65.3%，显著超越：①错误地图基准（40.1%）、②文献中纯射频方法（37.3%）、③我们设计的非人工智能融合基准（42.2%）。对比评估揭示了单独依赖射频数据或空间数据的局限性，以及人工智能在融合数据以提升智慧城市地图构建精度方面的有效性。我们进一步在奥斯陆地区真实数据上验证方法，通过实际部署场景补充合成评估，最优融合模型在该场景下达到64.9%宏平均交并比。同时提出一种区域分块滑动窗口部署策略，以实现模型在大范围区域的扩展应用。