With the exponential growth of video traffic, traditional video streaming systems are approaching their limits in compression efficiency and communication capacity. To further reduce bitrate while maintaining quality, we propose Promptus, a disruptive semantic communication system that streaming prompts instead of video content, which represents real-world video frames with a series of "prompts" for delivery and employs Stable Diffusion to generate videos at the receiver. To ensure that the generated video is pixel-aligned with the original video, a gradient descent-based prompt fitting framework is proposed. Further, a low-rank decomposition-based bitrate control algorithm is introduced to achieve adaptive bitrate. For inter-frame compression, an interpolation-aware fitting algorithm is proposed. Evaluations across various video genres demonstrate that, compared to H.265, Promptus can achieve more than a 4x bandwidth reduction while preserving the same perceptual quality. On the other hand, at extremely low bitrates, Promptus can enhance the perceptual quality by 0.139 and 0.118 (in LPIPS) compared to VAE and H.265, respectively, and decreases the ratio of severely distorted frames by 89.3% and 91.7%. Our work opens up a new paradigm for efficient video communication. Promptus is open-sourced at: https://github.com/JiangkaiWu/Promptus.
翻译:随着视频流量的指数级增长,传统视频流系统在压缩效率与通信容量方面正逼近其极限。为在保持质量的同时进一步降低码率,我们提出Promptus——一种颠覆性的语义通信系统,该系统通过流式传输提示而非视频内容来实现视频传输,即用一系列"提示"表征真实世界视频帧进行传输,并在接收端利用Stable Diffusion生成视频。为确保生成视频与原始视频像素级对齐,我们提出基于梯度下降的提示拟合框架。进一步引入基于低秩分解的码率控制算法以实现自适应码率。针对帧间压缩,提出插值感知的拟合算法。跨多种视频类型的评估表明,相较于H.265,Promptus在保持相同感知质量的同时可实现超过4倍的带宽降低。另一方面,在极低码率下,Promptus相比VAE和H.265分别将感知质量提升0.139和0.118(以LPIPS度量),并将严重失真帧比例降低89.3%与91.7%。本工作为高效视频通信开辟了新范式。Promptus已在https://github.com/JiangkaiWu/Promptus开源。