Diffusion models have been extensively utilized in AI-generated content (AIGC) in recent years, thanks to the superior generation capabilities. Combining with semantic communications, diffusion models are used for tasks such as denoising, data reconstruction, and content generation. However, existing diffusion-based generative models do not consider the stringent bandwidth limitation, which limits its application in wireless communication. This paper introduces a diffusion-driven semantic communication framework with advanced VAE-based compression for bandwidth-constrained generative model. Our designed architecture utilizes the diffusion model, where the signal transmission process through the wireless channel acts as the forward process in diffusion. To reduce bandwidth requirements, we incorporate a downsampling module and a paired upsampling module based on a variational auto-encoder with reparameterization at the receiver to ensure that the recovered features conform to the Gaussian distribution. Furthermore, we derive the loss function for our proposed system and evaluate its performance through comprehensive experiments. Our experimental results demonstrate significant improvements in pixel-level metrics such as peak signal to noise ratio (PSNR) and semantic metrics like learned perceptual image patch similarity (LPIPS). These enhancements are more profound regarding the compression rates and SNR compared to deep joint source-channel coding (DJSCC).
翻译:近年来,得益于卓越的生成能力,扩散模型已被广泛应用于人工智能生成内容领域。结合语义通信,扩散模型被用于去噪、数据重建和内容生成等任务。然而,现有的基于扩散的生成模型未考虑严格的带宽限制,这限制了其在无线通信中的应用。本文提出了一种扩散驱动的语义通信框架,采用基于变分自编码器的高级压缩技术,适用于带宽受限的生成模型。我们设计的架构利用扩散模型,其中信号通过无线信道的传输过程充当扩散中的前向过程。为降低带宽需求,我们在接收端引入了一个基于重参数化变分自编码器的下采样模块与配对的上采样模块,以确保恢复的特征符合高斯分布。此外,我们推导了所提出系统的损失函数,并通过综合实验评估其性能。实验结果表明,在像素级指标如峰值信噪比和语义指标如学习感知图像块相似度方面均有显著提升。与深度联合信源信道编码相比,这些改进在压缩率和信噪比方面更为显著。