This paper presents a novel approach for multimodal data fusion based on the Vector-Quantized Variational Autoencoder (VQVAE) architecture. The proposed method is simple yet effective in achieving excellent reconstruction performance on paired MNIST-SVHN data and WiFi spectrogram data. Additionally, the multimodal VQVAE model is extended to the 5G communication scenario, where an end-to-end Channel State Information (CSI) feedback system is implemented to compress data transmitted between the base-station (eNodeB) and User Equipment (UE), without significant loss of performance. The proposed model learns a discriminative compressed feature space for various types of input data (CSI, spectrograms, natural images, etc), making it a suitable solution for applications with limited computational resources.
翻译:本文提出了一种基于向量量化变分自编码器(VQVAE)架构的多模态数据融合新方法。该方法设计简洁且高效,能够在配对的MNIST-SVHN数据集和WiFi频谱图数据上实现卓越的重建性能。此外,该多模态VQVAE模型被扩展应用于5G通信场景,实现了一个端到端的信道状态信息(CSI)反馈系统,在基站(eNodeB)与用户设备(UE)之间进行数据压缩传输,且性能损失极小。该模型能够为不同类型的输入数据(CSI、频谱图、自然图像等)学习具有判别性的压缩特征空间,使其成为计算资源受限场景下的理想解决方案。