Recent advancements in large-scale text-to-image diffusion models have enabled many applications in image editing. However, none of these methods have been able to edit the layout of single existing images. To address this gap, we propose the first framework for layout editing of a single image while preserving its visual properties, thus allowing for continuous editing on a single image. Our approach is achieved through two key modules. First, to preserve the characteristics of multiple objects within an image, we disentangle the concepts of different objects and embed them into separate textual tokens using a novel method called masked textual inversion. Next, we propose a training-free optimization method to perform layout control for a pre-trained diffusion model, which allows us to regenerate images with learned concepts and align them with user-specified layouts. As the first framework to edit the layout of existing images, we demonstrate that our method is effective and outperforms other baselines that were modified to support this task. Our code will be freely available for public use upon acceptance.
翻译:近年来,大规模文本到图像扩散模型的进展使得图像编辑领域的诸多应用成为可能。然而,现有方法均无法对单张现有图像的布局进行编辑。为解决这一问题,我们首次提出了一种能在保留视觉属性的同时实现单图像布局编辑的框架,从而支持对单张图像的连续编辑。我们的方法通过两个关键模块实现。首先,为保留图像中多个物体的特征,我们通过一种名为掩码文本反转的新颖方法,将不同物体的概念解耦并嵌入到独立的文本标记中。其次,我们提出一种无需训练的优化方法,对预训练扩散模型进行布局控制,从而能够利用学习到的概念重新生成图像,并使其与用户指定的布局对齐。作为首个编辑现有图像布局的框架,我们证明了该方法的有效性,且其性能优于为支持此任务而改进的其他基线方法。我们的代码将在论文被接收后免费公开。