Building 6G Radio Foundation Models with Transformer Architectures

Foundation deep learning (DL) models are general models, designed to learn general, robust and adaptable representations of their target modality, enabling finetuning across a range of downstream tasks. These models are pretrained on large, unlabeled datasets using self-supervised learning (SSL). Foundation models have demonstrated better generalization than traditional supervised approaches, a critical requirement for wireless communications where the dynamic environment demands model adaptability. In this work, we propose and demonstrate the effectiveness of a Vision Transformer (ViT) as a radio foundation model for spectrogram learning. We introduce a Masked Spectrogram Modeling (MSM) approach to pretrain the ViT in a self-supervised fashion. We evaluate the ViT-based foundation model on two downstream tasks: Channel State Information (CSI)-based Human Activity sensing and Spectrogram Segmentation. Experimental results demonstrate competitive performance to supervised training while generalizing across diverse domains. Notably, the pretrained ViT model outperforms a four-times larger model that is trained from scratch on the spectrogram segmentation task, while requiring significantly less training time, and achieves competitive performance on the CSI-based human activity sensing task. This work demonstrates the effectiveness of ViT with MSM for pretraining as a promising technique for scalable foundation model development in future 6G networks.

翻译：基础深度学习模型是一种通用模型，旨在学习目标模态的通用、鲁棒且适应性强的表征，从而能够在一系列下游任务中进行微调。这些模型通过自监督学习在大规模无标注数据集上进行预训练。相较于传统的监督学习方法，基础模型展现出更优异的泛化能力——这是无线通信领域的关键需求，因为动态环境要求模型具备适应能力。在本研究中，我们提出并论证了视觉Transformer作为频谱图学习无线电基础模型的有效性。我们引入了掩码频谱图建模方法，以自监督方式对ViT进行预训练。我们在两个下游任务上评估了基于ViT的基础模型：基于信道状态信息的人类活动感知和频谱图分割。实验结果表明，该模型在保持跨领域泛化能力的同时，取得了与监督训练相竞争的性能。值得注意的是，预训练的ViT模型在频谱图分割任务上优于参数量四倍于其的从头训练模型，且所需训练时间显著减少；在基于CSI的人类活动感知任务上也取得了具有竞争力的性能。本研究表明，采用MSM预训练的ViT作为未来6G网络中可扩展基础模型开发的前沿技术具有显著潜力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日