Multi-Modal Sensing and Fusion in mmWave Beamforming for Connected Vehicles: A Transformer Based Framework

Millimeter wave (mmWave) communication, utilizing beamforming techniques to address the inherent path loss limitation, is considered as one of the key technologies to support ever increasing high throughput and low latency demands of connected vehicles. However, adopting standard defined beamforming approach in highly dynamic vehicular environments often incurs high beam training overheads and reduction in the available airtime for communications, which is mainly due to exchanging pilot signals and exhaustive beam measurements. To this end, we present a multi-modal sensing and fusion learning framework as a potential alternative solution to reduce such overheads. In this framework, we first extract the representative features from the sensing modalities by modality specific encoders, then, utilize multi-head cross-modal attention to learn dependencies and correlations between different modalities, and subsequently fuse the multimodal features to obtain predicted top-k beams so that the best line-of-sight links can be proactively established. To show the generalizability of the proposed framework, we perform a comprehensive experiment in four different vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) scenarios from real world multimodal and 60 GHz mmWave wireless sensing data. The experiment reveals that the proposed framework (i) achieves up to 96.72% accuracy on predicting top-15 beams correctly, (ii) incurs roughly 0.77 dB average power loss, and (iii) improves the overall latency and beam searching space overheads by 86.81% and 76.56% respectively for top-15 beams compared to standard defined approach.

翻译：毫米波通信利用波束赋形技术应对固有的路径损耗限制，被视为支撑网联车辆日益增长的高吞吐量与低时延需求的关键技术之一。然而，在高度动态的车载环境中采用标准定义的波束赋形方法通常会产生高额的波束训练开销并减少可用于通信的空中时间，这主要源于导频信号的交换与穷举式波束测量。为此，我们提出一种多模态感知与融合学习框架，作为降低此类开销的潜在替代方案。在该框架中，我们首先通过模态特定编码器从感知模态中提取代表性特征，随后利用多头跨模态注意力学习不同模态间的依赖关系与相关性，进而融合多模态特征以预测出前k个最优波束，从而能够主动建立最佳视距链路。为展示所提框架的泛化能力，我们基于真实世界的多模态与60 GHz毫米波无线感知数据，在四种不同的车对基础设施与车对车场景中进行了综合实验。实验结果表明，所提框架（i）在正确预测前15个波束方面达到了最高96.72%的准确率，（ii）产生约0.77 dB的平均功率损耗，并且（iii）相较于标准定义方法，在前15个波束的总体时延与波束搜索空间开销方面分别提升了86.81%和76.56%。