De-Diffusion Makes Text a Strong Cross-Modal Interface

We demonstrate text as a strong cross-modal interface. Rather than relying on deep embeddings to connect image and language as the interface representation, our approach represents an image as text, from which we enjoy the interpretability and flexibility inherent to natural language. We employ an autoencoder that uses a pre-trained text-to-image diffusion model for decoding. The encoder is trained to transform an input image into text, which is then fed into the fixed text-to-image diffusion decoder to reconstruct the original input -- a process we term De-Diffusion. Experiments validate both the precision and comprehensiveness of De-Diffusion text representing images, such that it can be readily ingested by off-the-shelf text-to-image tools and LLMs for diverse multi-modal tasks. For example, a single De-Diffusion model can generalize to provide transferable prompts for different text-to-image tools, and also achieves a new state of the art on open-ended vision-language tasks by simply prompting large language models with few-shot examples.

翻译：我们展示了文本作为一种强大的跨模态接口。不同于依赖深度嵌入作为接口表示来连接图像和语言，我们的方法将图像表示为文本，从而享有自然语言固有的可解释性与灵活性。我们采用一个自动编码器，其解码器使用预训练的文本到图像扩散模型。编码器被训练将输入图像转换为文本，随后将文本输入固定的文本到图像扩散解码器中，以重建原始输入——我们将这一过程称为“去扩散”。实验验证了去扩散文本在表征图像方面的精确性与全面性，使其能够被现成的文本到图像工具和大语言模型直接用于多样化的多模态任务。例如，单一的去扩散模型可泛化用于为不同文本到图像工具提供可迁移的提示词，并通过简单的少样本提示大语言模型，在开放式视觉语言任务上达到了新的最优水平。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日