Continuous Layout Editing of Single Images with Diffusion Models

Recent advancements in large-scale text-to-image diffusion models have enabled many applications in image editing. However, none of these methods have been able to edit the layout of single existing images. To address this gap, we propose the first framework for layout editing of a single image while preserving its visual properties, thus allowing for continuous editing on a single image. Our approach is achieved through two key modules. First, to preserve the characteristics of multiple objects within an image, we disentangle the concepts of different objects and embed them into separate textual tokens using a novel method called masked textual inversion. Next, we propose a training-free optimization method to perform layout control for a pre-trained diffusion model, which allows us to regenerate images with learned concepts and align them with user-specified layouts. As the first framework to edit the layout of existing images, we demonstrate that our method is effective and outperforms other baselines that were modified to support this task. Our code will be freely available for public use upon acceptance.

翻译：近年来，大规模文本到图像扩散模型的进展使得图像编辑领域的诸多应用成为可能。然而，现有方法均无法对单张现有图像的布局进行编辑。为解决这一问题，我们首次提出了一种能在保留视觉属性的同时实现单图像布局编辑的框架，从而支持对单张图像的连续编辑。我们的方法通过两个关键模块实现。首先，为保留图像中多个物体的特征，我们通过一种名为掩码文本反转的新颖方法，将不同物体的概念解耦并嵌入到独立的文本标记中。其次，我们提出一种无需训练的优化方法，对预训练扩散模型进行布局控制，从而能够利用学习到的概念重新生成图像，并使其与用户指定的布局对齐。作为首个编辑现有图像布局的框架，我们证明了该方法的有效性，且其性能优于为支持此任务而改进的其他基线方法。我们的代码将在论文被接收后免费公开。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日