This paper studies an end-to-end video semantic communication system for massive communication. In the considered system, the transmitter must continuously send the video to the receiver to facilitate character reconstruction in immersive applications, such as interactive video conference. However, transmitting the original video information with substantial amounts of data poses a challenge to the limited wireless resources. To address this issue, we reduce the amount of data transmitted by making the transmitter extract and send the semantic information from the video, which refines the major object and the correlation of time and space in the video. Specifically, we first develop a video semantic communication system based on major object extraction (MOE) and contextual video encoding (CVE) to achieve efficient video transmission. Then, we design the MOE and CVE modules with convolutional neural network based motion estimation, contextual extraction and entropy coding. Simulation results show that compared to the traditional coding schemes, the proposed method can reduce the amount of transmitted data by up to 25% while increasing the peak signal-to-noise ratio (PSNR) of the reconstructed video by up to 14%.
翻译:本文研究面向大规模通信的端到端视频语义通信系统。在所考虑系统中,发射机必须持续向接收机发送视频,以支持交互式视频会议等沉浸式应用中的人物重建。然而,传输包含海量数据的原始视频信息对有限的无线资源构成挑战。为解决该问题,我们通过让发射机提取并发送视频中的语义信息来降低传输数据量,该语义信息精炼了视频中的主物体及时空相关性。具体而言,我们首先开发了基于主物体提取(MOE)和上下文视频编码(CVE)的视频语义通信系统以实现高效视频传输。进而,我们设计了基于卷积神经网络的运动估计、上下文提取与熵编码的MOE和CVE模块。仿真结果表明,与传统编码方案相比,所提方法在使传输数据量降低高达25%的同时,将重建视频的峰值信噪比(PSNR)提升了高达14%。