Denoising diffusion probabilistic models (DDPMs) employ a sequence of white Gaussian noise samples to generate an image. In analogy with GANs, those noise maps could be considered as the latent code associated with the generated image. However, this native noise space does not possess a convenient structure, and is thus challenging to work with in editing tasks. Here, we propose an alternative latent noise space for DDPM that enables a wide range of editing operations via simple means, and present an inversion method for extracting these edit-friendly noise maps for any given image (real or synthetically generated). As opposed to the native DDPM noise space, the edit-friendly noise maps do not have a standard normal distribution and are not statistically independent across timesteps. However, they allow perfect reconstruction of any desired image, and simple transformations on them translate into meaningful manipulations of the output image (e.g. shifting, color edits). Moreover, in text-conditional models, fixing those noise maps while changing the text prompt, modifies semantics while retaining structure. We illustrate how this property enables text-based editing of real images via the diverse DDPM sampling scheme (in contrast to the popular non-diverse DDIM inversion). We also show how it can be used within existing diffusion-based editing methods to improve their quality and diversity. Webpage: https://inbarhub.github.io/DDPM_inversion
翻译:去噪扩散概率模型(DDPMs)通过一系列白高斯噪声样本来生成图像。与生成对抗网络(GANs)类似,这些噪声图可被视为生成图像的潜在编码。然而,这种原生噪声空间缺乏便捷的结构,因而在编辑任务中难以处理。为此,我们为DDPM提出一种替代的潜在噪声空间,可通过简单手段实现广泛的编辑操作,并给出一种逆映射方法,以从任意给定图像(真实或合成生成)中提取这些可编辑噪声图。与原生DDPM噪声空间不同,可编辑噪声图不具有标准正态分布,且在时间步之间统计不独立。然而,它们能完美重构任意目标图像,并且对其进行的简单变换会转化为输出图像的有意义操作(例如平移、色彩编辑)。此外,在文本条件模型中,固定这些噪声图的同时改变文本提示,可在保留结构的同时修改语义。我们展示了这一特性如何通过多样化的DDPM采样方案(对比流行的非多样化DDIM逆映射)实现对真实图像的基于文本的编辑。我们还展示了它如何用于现有扩散基础编辑方法中,以提升其质量与多样性。网页:https://inbarhub.github.io/DDPM_inversion