Semantic communication is expected to be one of the cores of next-generation AI-based communications. One of the possibilities offered by semantic communication is the capability to regenerate, at the destination side, images or videos semantically equivalent to the transmitted ones, without necessarily recovering the transmitted sequence of bits. The current solutions still lack the ability to build complex scenes from the received partial information. Clearly, there is an unmet need to balance the effectiveness of generation methods and the complexity of the transmitted information, possibly taking into account the goal of communication. In this paper, we aim to bridge this gap by proposing a novel generative diffusion-guided framework for semantic communication that leverages the strong abilities of diffusion models in synthesizing multimedia content while preserving semantic features. We reduce bandwidth usage by sending highly-compressed semantic information only. Then, the diffusion model learns to synthesize semantic-consistent scenes through spatially-adaptive normalizations from such denoised semantic information. We prove, through an in-depth assessment of multiple scenarios, that our method outperforms existing solutions in generating high-quality images with preserved semantic information even in cases where the received content is significantly degraded. More specifically, our results show that objects, locations, and depths are still recognizable even in the presence of extremely noisy conditions of the communication channel. The code is available at https://github.com/ispamm/GESCO.
翻译:语义通信有望成为下一代基于人工智能通信的核心之一。语义通信提供的可能性之一是,在接收端能够再生与传输图像或视频语义等效的内容,而无需恢复传输的比特序列。当前解决方案仍缺乏从接收的部分信息构建复杂场景的能力。显然,在生成方法的有效性与传输信息的复杂性之间寻求平衡存在未满足的需求,且可能需要考虑通信目标。本文旨在通过提出一种新颖的生成式扩散引导语义通信框架来弥合这一差距,该框架利用扩散模型在合成多媒体内容方面的强大能力,同时保留语义特征。我们仅发送高度压缩的语义信息以降低带宽占用。随后,扩散模型通过空间自适应归一化从这些去噪语义信息中学习合成语义一致的场景。通过多场景深入评估,我们证明即使接收内容严重降质,该方法在生成保留语义信息的高质量图像方面仍优于现有解决方案。具体而言,结果表明即使在通信信道极端噪声条件下,物体、位置和深度仍可被识别。代码见https://github.com/ispamm/GESCO。