Learning Continuous 3D Words for Text-to-Image Generation

Current controls over diffusion models (e.g., through text or ControlNet) for image generation fall short in recognizing abstract, continuous attributes like illumination direction or non-rigid shape change. In this paper, we present an approach for allowing users of text-to-image models to have fine-grained control of several attributes in an image. We do this by engineering special sets of input tokens that can be transformed in a continuous manner -- we call them Continuous 3D Words. These attributes can, for example, be represented as sliders and applied jointly with text prompts for fine-grained control over image generation. Given only a single mesh and a rendering engine, we show that our approach can be adopted to provide continuous user control over several 3D-aware attributes, including time-of-day illumination, bird wing orientation, dollyzoom effect, and object poses. Our method is capable of conditioning image creation with multiple Continuous 3D Words and text descriptions simultaneously while adding no overhead to the generative process. Project Page: https://ttchengab.github.io/continuous_3d_words

翻译：当前的扩散模型控制（如通过文本或ControlNet）在图像生成中难以识别抽象的连续属性，例如光照方向或非刚性形状变化。在本文中，我们提出一种方法，使文本到图像模型的用户能够对图像中的多个属性进行精细控制。我们通过设计一组可连续变换的特殊输入标记来实现这一点——我们将其称为“连续3D词汇”。例如，这些属性可以表示为滑块，并与文本提示联合应用，以实现对图像生成的精细控制。仅需一个单一网格和渲染引擎，我们展示了该方法能够提供对多种3D感知属性的连续用户控制，包括昼夜光照、鸟类翅膀朝向、推拉变焦效果以及物体姿态。我们的方法能够同时使用多个连续3D词汇和文本描述来条件化图像生成，且不增加生成过程中的额外开销。项目页面：https://ttchengab.github.io/continuous_3d_words

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日