Mirage: Transmitting a Video as a Perceptual Illusion for 50,000X Speedup

The existing communication framework mainly aims at accurate reconstruction of source signals to ensure reliable transmission. However, this signal-level fidelity-oriented design often incurs high communication overhead and system complexity, particularly in video communication scenarios where mainstream frameworks rely on transmitting visual data itself, resulting in significant bandwidth consumption. To address this issue, we propose a visual data-free communication framework, Mirage, for extremely efficient video transmission while preserving semantic information. Mirage decomposes video content into two complementary components: temporal sequence information capturing motion dynamics and spatial appearance representations describing overall visual structure. Temporal information is preserved through video captioning, while key frames are encoded into compact semantic representations for spatial appearance. These representations are transmitted to the receiver, where videos are synthesized using generative video models. Since no raw visual data is transmitted, Mirage is inherently privacy-preserving. Mirage also supports personalized adaptation across deployment scenarios. The sender, network, and receiver can independently impose constraints on semantic representation, transmission, and generation, enabling flexible trade-offs between efficiency, privacy, control, and perceptual quality. Experimental results in video transmission demonstrate that Mirage achieves up to a 50000X data-level compression speedup over raw video transmission, with gains expected to scale with larger video content sizes.

翻译：现有的通信框架主要旨在准确重建源信号以确保可靠传输。然而，这种面向信号级保真度的设计通常会产生较高的通信开销和系统复杂性，尤其是在视频通信场景中，主流框架依赖于传输视觉数据本身，导致显著的带宽消耗。为解决此问题，我们提出了一种无视觉数据的通信框架——Mirage，用于在保留语义信息的同时实现极其高效的视频传输。Mirage将视频内容分解为两个互补的组成部分：捕捉运动动态的时间序列信息和描述整体视觉结构的空间外观表征。时间信息通过视频描述生成得以保留，而关键帧则被编码为紧凑的语义表征以表示空间外观。这些表征被传输至接收端，在那里使用生成式视频模型合成视频。由于不传输原始视觉数据，Mirage本质上具有隐私保护性。Mirage还支持跨部署场景的个性化适配。发送方、网络和接收方可以独立地对语义表征、传输和生成施加约束，从而在效率、隐私、控制力和感知质量之间实现灵活的权衡。视频传输实验结果表明，与原始视频传输相比，Mirage在数据层面实现了高达50000倍的压缩加速，且预计增益将随视频内容规模的增大而提升。