This article studies the problem of image segmentation-based semantic communication in autonomous driving. In real traffic scenes, detecting the key objects (e.g., vehicles, pedestrians and obstacles) is more crucial than that of other objects to guarantee driving safety. Therefore, we propose a vehicular image segmentation-oriented semantic communication system, termed VIS-SemCom, where image segmentation features of important objects are transmitted to reduce transmission redundancy. First, to accurately extract image semantics, we develop a semantic codec based on Swin Transformer architecture, which expands the perceptual field thus improving the segmentation accuracy. Next, we propose a multi-scale semantic extraction scheme via assigning the number of Swin Transformer blocks for diverse resolution features, thus highlighting the important objects' accuracy. Furthermore, the importance-aware loss is invoked to emphasize the important objects, and an online hard sample mining (OHEM) strategy is proposed to handle small sample issues in the dataset. Experimental results demonstrate that the proposed VIS-SemCom can achieve a coding gain of nearly 6 dB with a 60% mean intersection over union (mIoU), reduce the transmitted data amount by up to 70% with a 60% mIoU, and improve the segmentation intersection over union (IoU) of important objects by 4%, compared to traditional transmission scheme.
翻译:本文研究了自动驾驶中基于图像分割的语义通信问题。在实际交通场景中,检测关键物体(如车辆、行人和障碍物)比检测其他物体对保障驾驶安全更为重要。为此,我们提出了一种面向车载图像分割的语义通信系统,称为VIS-SemCom,通过传输重要物体的图像分割特征来降低传输冗余。首先,为精确提取图像语义,我们基于Swin Transformer架构开发了语义编解码器,该架构扩展了感知场,从而提升了分割精度。其次,我们提出了一种多尺度语义提取方案,通过为不同分辨率特征分配不同数量的Swin Transformer模块,突出了重要物体的精度。此外,引入重要性感知损失以强调重要物体,并提出在线难例挖掘(OHEM)策略来处理数据集中的小样本问题。实验结果表明,与传统传输方案相比,所提出的VIS-SemCom在平均交并比(mIoU)为60%时,可获得近6 dB的编码增益;在mIoU为60%时,传输数据量最多可减少70%;重要物体的分割交并比(IoU)提升了4%。