Ultra-high-resolution streaming and emerging immersive services are driving rapidly increasing wireless video traffic. However, perceptually pleasing video transmission over bandwidth-limited and latency-constrained wireless links remains challenging for conventional separated source-channel systems, which primarily target bit-level reliability and often suffer performance degradation under short-blocklength transmission. In addition, pixel-level distortion optimization does not necessarily align with human perception, while existing learned video codecs may incur high complexity and raise deployment issues. This paper proposes PVSC, a perception-aware video semantic communication framework for real-time wireless video transmission. PVSC eliminates explicit motion-vector transmission and exploits spatio-temporal feature coding to generate compact and channel-robust symbol streams. It also specifies side-information formatting, reference-buffer management, and lightweight rate control, enabling stable receiver-side reconstruction and bandwidth-adaptive inference with a single model. Extensive experiments demonstrate that PVSC achieves superior performance across diverse datasets, resolutions, GOP configurations, and channel conditions. Compared with the engineered ``VTM + 5G LDPC'' baseline, PVSC saves up to about 75% and 87% bandwidth at comparable LPIPS and DISTS, respectively, while enabling real-time inference on a single NVIDIA RTX 4090 GPU.
翻译:超高清流媒体和新兴沉浸式服务正推动无线视频流量急剧增长。然而,在带宽受限且延迟约束的无线链路上实现感知上令人满意的视频传输,对传统的分离式信源信道系统仍是一个挑战。这类系统主要追求比特级可靠性,在短块传输条件下常出现性能下降。此外,像素级失真优化并不必然与人类感知一致,而现有学习型视频编解码器可能带来高复杂性和部署问题。本文提出PVSC,一种面向感知的实时无线视频传输语义通信框架。PVSC消除显式运动矢量传输,并利用时空特征编码生成紧凑且抗信道干扰的符号流。该框架还规定了边信息格式化、参考缓冲区管理和轻量级码率控制,从而利用单一模型实现稳定的接收端重建和带宽自适应推理。大量实验表明,PVSC在多样化数据集、分辨率、GOP配置和信道条件下均展现出优越性能。与工程化基准“VTM + 5G LDPC”相比,PVSC在达到可比LPIPS和DISTS指标时,分别节省约75%和87%带宽,同时能够在单个NVIDIA RTX 4090 GPU上实现实时推理。