MapViT：基于两阶段视觉Transformer的动态环境实时无线质量地图预测框架 (MapViT: A Two-Stage ViT-Based Framework for Real-Time Radio Quality Map Prediction in Dynamic Environments)

Recent advancements in mobile and wireless networks are unlocking the full potential of robotic autonomy, enabling robots to take advantage of ultra-low latency, high data throughput, and ubiquitous connectivity. However, for robots to navigate and operate seamlessly, efficiently and reliably, they must have an accurate understanding of both their surrounding environment and the quality of radio signals. Achieving this in highly dynamic and ever-changing environments remains a challenging and largely unsolved problem. In this paper, we introduce MapViT, a two-stage Vision Transformer (ViT)-based framework inspired by the success of pre-train and fine-tune paradigm for Large Language Models (LLMs). MapViT is designed to predict both environmental changes and expected radio signal quality. We evaluate the framework using a set of representative Machine Learning (ML) models, analyzing their respective strengths and limitations across different scenarios. Experimental results demonstrate that the proposed two-stage pipeline enables real-time prediction, with the ViT-based implementation achieving a strong balance between accuracy and computational efficiency. This makes MapViT a promising solution for energy- and resource-constrained platforms such as mobile robots. Moreover, the geometry foundation model derived from the self-supervised pre-training stage improves data efficiency and transferability, enabling effective downstream predictions even with limited labeled data. Overall, this work lays the foundation for next-generation digital twin ecosystems, and it paves the way for a new class of ML foundation models driving multi-modal intelligence in future 6G-enabled systems.

翻译：近年来，移动与无线网络的进步正全面释放机器人自主性的潜力，使其能够利用超低延迟、高数据吞吐量和泛在连接。然而，要使机器人实现无缝、高效且可靠的导航与操作，它们必须准确理解周围环境及无线信号质量。在高度动态且不断变化的环境中实现这一目标，仍然是一个具有挑战性且尚未完全解决的问题。本文提出MapViT，这是一个受大型语言模型预训练与微调范式成功启发的、基于两阶段视觉Transformer的框架。MapViT旨在同时预测环境变化与预期无线信号质量。我们使用一组代表性的机器学习模型对该框架进行评估，分析了它们在不同场景下的各自优势与局限性。实验结果表明，所提出的两阶段流程能够实现实时预测，其中基于ViT的实现方案在预测精度与计算效率之间取得了良好的平衡。这使得MapViT成为适用于移动机器人等能量与资源受限平台的有前景的解决方案。此外，从自监督预训练阶段衍生的几何基础模型提升了数据效率与可迁移性，即使在标注数据有限的情况下也能实现有效的下游预测。总体而言，这项工作为下一代数字孪生生态系统奠定了基础，并为推动未来6G使能系统中多模态智能的新型机器学习基础模型开辟了道路。