This paper proposes new framework of communication system leveraging promising generation capabilities of multi-modal generative models. Regarding nowadays smart applications, successful communication can be made by conveying the perceptual meaning, which we set as text prompt. Text serves as a suitable semantic representation of image data as it has evolved to instruct an image or generate image through multi-modal techniques, by being interpreted in a manner similar to human cognition. Utilizing text can also reduce the overload compared to transmitting the intact data itself. The transmitter converts objective image to text through multi-model generation process and the receiver reconstructs the image using reverse process. Each word in the text sentence has each syntactic role, responsible for particular piece of information the text contains. For further efficiency in communication load, the transmitter sequentially sends words in priority of carrying the most information until reaches successful communication. Therefore, our primary focus is on the promising design of a communication system based on image-to-text transformation and the proposed schemes for sequentially transmitting word tokens. Our work is expected to pave a new road of utilizing state-of-the-art generative models to real communication systems
翻译:本文提出了一种新的通信系统框架,利用多模态生成模型的强大生成能力。针对当前智能应用场景,成功的通信可通过传递感知语义(本文定义为文本提示)实现。文本作为图像数据的适宜语义表征,通过类似人类认知的解析方式,经由多模态技术实现图像指令或图像生成,既具备指令性功能又具有生成性特征。相较于传输原始数据,采用文本还可降低传输负载。发射端通过多模态生成过程将目标图像转换为文本,接收端则通过逆过程重建图像。文本句子中的每个单词均具有特定句法角色,承担文本所含特定信息片段。为进一步提升通信负载效率,发射端按信息承载优先级依次发送单词,直至达成成功通信。因此,我们的核心工作聚焦于基于图像-文本转换的通信系统前瞻设计,以及顺序传输词元的方案。本研究有望为将最先进生成模型应用于实际通信系统开辟新路径。