Semantic communication focuses on conveying the task-relevant meaning rather than exact bitwise recovery. For image transmission with a generative receiver, relying only on text descriptions can be insufficient to preserve instance-specific visual evidence, whereas sending dense latent representations can incur substantial overhead. This paper presents a receiver-driven closed-loop scheme that transmits a short caption together with an initial sparse subset of latent blocks, and then uses feedback to request additional blocks only when needed. At each round, the receiver reconstructs the image via latent diffusion inpainting and applies a semantic consistency check between a caption generated from the reconstruction and the received caption, using a lightweight language similarity score such as ROUGE-L. The receiver stops early once a target consistency level is met, and otherwise requests a small number of additional latent blocks to refine the reconstruction. Experiments on Flickr30k over AWGN channels demonstrate a controllable rate-quality tradeoff. Adaptive feedback achieves the strongest semantic alignment and the lowest failure rate, outperforming budget-matched one-shot transmission while typically using fewer latent blocks than always-on retransmission.
翻译:语义通信专注于传递任务相关的含义,而非精确的比特级恢复。对于具有生成式接收端的图像传输,仅依赖文本描述可能不足以保留实例特定的视觉证据,而发送稠密的潜在表示则会带来巨大的开销。本文提出一种接收端驱动的闭环方案,该方案传输一个简短标题以及一个初始的稀疏潜在块子集,然后仅在需要时利用反馈请求额外的块。在每一轮中,接收端通过潜在扩散修复技术重建图像,并应用语义一致性检查,该检查比较从重建图像生成的标题与接收到的标题,使用轻量级的语言相似度评分(如ROUGE-L)。一旦达到目标一致性水平,接收端便提前停止;否则,请求少量额外的潜在块以优化重建。在AWGN信道上的Flickr30k数据集实验表明,该方法实现了可控的速率-质量权衡。自适应反馈实现了最强的语义对齐和最低的失败率,其性能优于预算匹配的单次传输,并且通常比持续重传方案使用更少的潜在块。