With the exponential growth of video traffic, traditional video streaming systems are approaching their limits in compression efficiency and communication capacity. To further reduce bitrate while maintaining quality, we propose Promptus, a disruptive novel system that streaming prompts instead of video content with Stable Diffusion, which converts video frames into a series of "prompts" for delivery. To ensure pixel alignment, a gradient descent-based prompt fitting framework is proposed. To achieve adaptive bitrate for prompts, a low-rank decomposition-based bitrate control algorithm is introduced. For inter-frame compression of prompts, a temporal smoothing-based prompt interpolation algorithm is proposed. Evaluations across various video domains and real network traces demonstrate Promptus can enhance the perceptual quality by 0.111 and 0.092 (in LPIPS) compared to VAE and H.265, respectively, and decreases the ratio of severely distorted frames by 89.3% and 91.7%. Moreover, Promptus achieves real-time video generation from prompts at over 150 FPS. To the best of our knowledge, Promptus is the first attempt to replace video codecs with prompt inversion and the first to use prompt streaming instead of video streaming. Our work opens up a new paradigm for efficient video communication beyond the Shannon limit.
翻译:随着视频流量的指数级增长,传统视频流系统在压缩效率和通信容量方面正接近其极限。为了在保持质量的同时进一步降低比特率,我们提出了Promptus,这是一种颠覆性的新型系统,它利用Stable Diffusion传输提示而非视频内容,将视频帧转换为一组用于传输的"提示"。为确保像素对齐,我们提出了一种基于梯度下降的提示拟合框架。为实现提示的自适应比特率,引入了一种基于低秩分解的比特率控制算法。针对提示的帧间压缩,提出了一种基于时序平滑的提示插值算法。跨多个视频领域和真实网络轨迹的评估表明,与VAE和H.265相比,Promptus分别将感知质量提升了0.111和0.092(以LPIPS衡量),并将严重失真帧的比例降低了89.3%和91.7%。此外,Promptus实现了超过150 FPS的提示实时视频生成。据我们所知,Promptus是首次尝试用提示反演替代视频编解码器,也是首次使用提示流替代视频流。我们的工作为超越香农极限的高效视频通信开辟了新的范式。