SpaText: Spatio-Textual Representation for Controllable Image Generation - 专知论文

会员服务 ·

0

控制器 · state-of-the-art · MoDELS · 表示 · Guidance ·

2023 年 3 月 19 日

SpaText: Spatio-Textual Representation for Controllable Image Generation

翻译：SpaText：面向可控图像生成的时空文本表示

Omri Avrahami,Thomas Hayes,Oran Gafni,Sonal Gupta,Yaniv Taigman,Devi Parikh,Dani Lischinski,Ohad Fried,Xi Yin

from arxiv, CVPR 2023. Project page available at: https://omriavrahami.com/spatext

Recent text-to-image diffusion models are able to generate convincing results of unprecedented quality. However, it is nearly impossible to control the shapes of different regions/objects or their layout in a fine-grained fashion. Previous attempts to provide such controls were hindered by their reliance on a fixed set of labels. To this end, we present SpaText - a new method for text-to-image generation using open-vocabulary scene control. In addition to a global text prompt that describes the entire scene, the user provides a segmentation map where each region of interest is annotated by a free-form natural language description. Due to lack of large-scale datasets that have a detailed textual description for each region in the image, we choose to leverage the current large-scale text-to-image datasets and base our approach on a novel CLIP-based spatio-textual representation, and show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-based. In addition, we show how to extend the classifier-free guidance method in diffusion models to the multi-conditional case and present an alternative accelerated inference algorithm. Finally, we offer several automatic evaluation metrics and use them, in addition to FID scores and a user study, to evaluate our method and show that it achieves state-of-the-art results on image generation with free-form textual scene control.

翻译：近期文本到图像的扩散模型能够生成质量空前的逼真结果。然而，现有方法几乎无法精细控制不同区域/物体的形状或其布局。此前提供此类控制的尝试受限于对固定标签集合的依赖。为此，我们提出SpaText——一种利用开放词汇场景控制进行文本到图像生成的新方法。除描述整个场景的全局文本提示外，用户还可提供分割图，其中每个感兴趣区域由自由形式的自然语言描述标注。由于缺乏对图像各区域配备详细文本描述的大规模数据集，我们选择利用现有大规模文本-图像数据集，基于新颖的CLIP时空文本表示构建方法，并在两种最先进的扩散模型（像素级与潜变量级）上验证其有效性。此外，我们展示了如何将扩散模型中的无分类器引导方法扩展至多条件情况，并提出一种替代性加速推理算法。最后，我们提出多项自动评估指标，结合FID分数与用户研究，评估表明本方法在自由形式文本场景控制的图像生成任务中达到了最先进水平。

0

相关内容

控制器

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

【TPAMI2022】从展示到讲述: 基于深度学习的图像描述研究综述论文，From Show to Tell: A Survey on Deep Learning-based Image Captioning

【TPAMI2022】从展示到讲述: 基于深度学习的图像描述研究综述论文，From Show to Tell: A Survey on Deep Learning-based Image Captioning

专知会员服务

24+阅读 · 2022年3月1日

ICCV'21 Oral｜拒绝调参，显著提点！检测分割任务的新损失函数RS Loss开源

专知会员服务

16+阅读 · 2021年8月11日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

专知

25+阅读 · 2018年5月28日

【论文推荐】最新7篇变分自编码器（VAE）相关论文—汉语诗歌、生成模型、跨模态、MR图像重建、机器翻译、推断、合成人脸

【论文推荐】最新7篇变分自编码器（VAE）相关论文—汉语诗歌、生成模型、跨模态、MR图像重建、机器翻译、推断、合成人脸

专知

11+阅读 · 2018年2月12日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

微气泡（群）生成的介尺度机理及工业微气泡发生器科学基础

国家自然科学基金

0+阅读 · 2015年12月31日

一维量子简并气体关联与临界性质研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于海量语料自然标注信息的汉语自然语块分析

国家自然科学基金

0+阅读 · 2013年12月31日

基于GΓMM和稀疏表示的高分辨率SAR图像变化检测研究

国家自然科学基金

0+阅读 · 2012年12月31日

非光滑近Hamilton系统全局动力学特性的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

基于融合的全向深度图像的生成及应用研究

国家自然科学基金

0+阅读 · 2010年12月31日

基于视频语义理解的艺术风格化研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于生物视觉机制的语义图像检索模型及方法

国家自然科学基金

0+阅读 · 2009年12月31日

印刷图像颜色信息的高保真传输与再现

国家自然科学基金

0+阅读 · 2009年12月31日

Robust Image Ordinal Regression with Controllable Image Generation

Arxiv

0+阅读 · 2023年5月10日

Multimodal-driven Talking Face Generation via a Unified Diffusion-based Generator

Arxiv

0+阅读 · 2023年5月9日

E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation

Arxiv

0+阅读 · 2023年5月9日

Locally Attentional SDF Diffusion for Controllable 3D Shape Generation

Arxiv

0+阅读 · 2023年5月9日

Controllable Data Generation by Deep Learning: A Review

Arxiv

15+阅读 · 2022年7月19日

A Decade Survey of Content Based Image Retrieval using Deep Learning

Arxiv

23+阅读 · 2020年11月23日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Image Segmentation Using Deep Learning: A Survey

Image Segmentation Using Deep Learning: A Survey

Arxiv

47+阅读 · 2020年1月15日

Image Captioning using Deep Neural Architectures

Arxiv

20+阅读 · 2018年1月17日

Exploring Models and Data for Remote Sensing Image Caption Generation

Arxiv

14+阅读 · 2017年12月21日

VIP会员

文章信息

相关主题

state-of-the-art

最新内容

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

专知会员服务

0+阅读 · 今天15:52

《边缘端实时无线感知赋能现场多机器人部署》200页

《边缘端实时无线感知赋能现场多机器人部署》200页

专知会员服务

2+阅读 · 今天15:32

战力倍增器：自主武器系统与乌克兰及加沙冲突

战力倍增器：自主武器系统与乌克兰及加沙冲突

专知会员服务

1+阅读 · 今天15:24

人工智能赋能战场情报：提速决策进程

人工智能赋能战场情报：提速决策进程

专知会员服务

0+阅读 · 今天15:15

《拥抱新兴技术：面向未来军官的教育革新》

《拥抱新兴技术：面向未来军官的教育革新》

专知会员服务

2+阅读 · 今天15:11

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

专知会员服务

0+阅读 · 今天14:43

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

专知会员服务

0+阅读 · 今天14:40

《火线上的后勤保障：对抗环境下的随机规划模型研究——俄乌场景案例分析》99页

《火线上的后勤保障：对抗环境下的随机规划模型研究——俄乌场景案例分析》99页

专知会员服务

11+阅读 · 7月16日

《无人地面战车（UGV）的崛起》报告

《无人地面战车（UGV）的崛起》报告

专知会员服务

7+阅读 · 7月16日

《无人机参数化与集群飞行创新项目的监控流程管理：模型、策略及自适应解决方案》

《无人机参数化与集群飞行创新项目的监控流程管理：模型、策略及自适应解决方案》

专知会员服务

6+阅读 · 7月16日

《美军开放式任务系统（OMS）定义与文档（D&D）——Java关键抽象层（CAL）接口生成规范》47页标准

《美军开放式任务系统（OMS）定义与文档（D&D）——Java关键抽象层（CAL）接口生成规范》47页标准

专知会员服务

12+阅读 · 7月16日

美陆军任务式指挥人工智能解决方案

美陆军任务式指挥人工智能解决方案

专知会员服务

11+阅读 · 7月16日

ICML 2026 | 理论级自动形式化：从孤立命题到统一形式化知识库

ICML 2026 | 理论级自动形式化：从孤立命题到统一形式化知识库

专知会员服务

8+阅读 · 7月16日

综述 | 现代智能体自我改进，从模型更新到脚手架演化

综述 | 现代智能体自我改进，从模型更新到脚手架演化

专知会员服务

14+阅读 · 7月16日

美国陆军宣布“项目融合-顶点6”：现代化进程的关键里程碑

美国陆军宣布“项目融合-顶点6”：现代化进程的关键里程碑

专知会员服务

13+阅读 · 7月15日

相关VIP内容

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

【TPAMI2022】从展示到讲述: 基于深度学习的图像描述研究综述论文，From Show to Tell: A Survey on Deep Learning-based Image Captioning

【TPAMI2022】从展示到讲述: 基于深度学习的图像描述研究综述论文，From Show to Tell: A Survey on Deep Learning-based Image Captioning

专知会员服务

24+阅读 · 2022年3月1日

ICCV'21 Oral｜拒绝调参，显著提点！检测分割任务的新损失函数RS Loss开源

专知会员服务

16+阅读 · 2021年8月11日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《边缘端实时无线感知赋能现场多机器人部署》200页

人工智能赋能战场情报：提速决策进程

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

战力倍增器：自主武器系统与乌克兰及加沙冲突

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

专知

25+阅读 · 2018年5月28日

【论文推荐】最新7篇变分自编码器（VAE）相关论文—汉语诗歌、生成模型、跨模态、MR图像重建、机器翻译、推断、合成人脸

【论文推荐】最新7篇变分自编码器（VAE）相关论文—汉语诗歌、生成模型、跨模态、MR图像重建、机器翻译、推断、合成人脸

专知

11+阅读 · 2018年2月12日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Robust Image Ordinal Regression with Controllable Image Generation

Arxiv

0+阅读 · 2023年5月10日

Multimodal-driven Talking Face Generation via a Unified Diffusion-based Generator

Arxiv

0+阅读 · 2023年5月9日

E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation

Arxiv

0+阅读 · 2023年5月9日

Locally Attentional SDF Diffusion for Controllable 3D Shape Generation

Arxiv

0+阅读 · 2023年5月9日

Controllable Data Generation by Deep Learning: A Review

Arxiv

15+阅读 · 2022年7月19日

A Decade Survey of Content Based Image Retrieval using Deep Learning

Arxiv

23+阅读 · 2020年11月23日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Image Segmentation Using Deep Learning: A Survey

Image Segmentation Using Deep Learning: A Survey

Arxiv

47+阅读 · 2020年1月15日

Image Captioning using Deep Neural Architectures

Arxiv

20+阅读 · 2018年1月17日

Exploring Models and Data for Remote Sensing Image Caption Generation

Arxiv

14+阅读 · 2017年12月21日

相关基金

微气泡（群）生成的介尺度机理及工业微气泡发生器科学基础

国家自然科学基金

0+阅读 · 2015年12月31日

一维量子简并气体关联与临界性质研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于海量语料自然标注信息的汉语自然语块分析

国家自然科学基金

0+阅读 · 2013年12月31日

基于GΓMM和稀疏表示的高分辨率SAR图像变化检测研究

国家自然科学基金

0+阅读 · 2012年12月31日

非光滑近Hamilton系统全局动力学特性的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

基于融合的全向深度图像的生成及应用研究

国家自然科学基金

0+阅读 · 2010年12月31日

基于视频语义理解的艺术风格化研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于生物视觉机制的语义图像检索模型及方法

国家自然科学基金

0+阅读 · 2009年12月31日

印刷图像颜色信息的高保真传输与再现

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员