In the new paradigm of semantic communication (SC), the focus is on delivering meanings behind bits by extracting semantic information from raw data. Recent advances in data-to-text models facilitate language-oriented SC, particularly for text-transformed image communication via image-to-text (I2T) encoding and text-to-image (T2I) decoding. However, although semantically aligned, the text is too coarse to precisely capture sophisticated visual features such as spatial locations, color, and texture, incurring a significant perceptual difference between intended and reconstructed images. To address this limitation, in this paper, we propose a novel language-oriented SC framework that communicates both text and a compressed image embedding and combines them using a latent diffusion model to reconstruct the intended image. Experimental results validate the potential of our approach, which transmits only 2.09\% of the original image size while achieving higher perceptual similarities in noisy communication channels compared to a baseline SC method that communicates only through text.The code is available at https://github.com/ispamm/Img2Img-SC/ .
翻译:在语义通信(SC)的新范式中,重点是通过从原始数据中提取语义信息来传递比特背后的含义。数据到文本模型的近期进展促进了面向语言的语义通信,特别是通过图像到文本(I2T)编码和文本到图像(T2I)解码实现的文本转换图像通信。然而,尽管语义对齐,文本过于粗糙,难以精确捕捉空间位置、颜色和纹理等复杂视觉特征,导致预期图像与重建图像之间存在显著的感知差异。为解决这一局限,本文提出一种新颖的面向语言语义通信框架,该框架同时传输文本和压缩图像嵌入,并利用潜在扩散模型将两者结合以重建预期图像。实验结果验证了该方法的潜力,在噪声通信信道中,仅传输原始图像大小的2.09%,即可实现比仅通过文本通信的基线语义通信方法更高的感知相似度。代码已开源:https://github.com/ispamm/Img2Img-SC/。