Editing real images using a pre-trained text-to-image (T2I) diffusion/flow model often involves inverting the image into its corresponding noise map. However, inversion by itself is typically insufficient for obtaining satisfactory results, and therefore many methods additionally intervene in the sampling process. Such methods achieve improved results but are not seamlessly transferable between model architectures. Here, we introduce FlowEdit, a text-based editing method for pre-trained T2I flow models, which is inversion-free, optimization-free and model agnostic. Our method constructs an ODE that directly maps between the source and target distributions (corresponding to the source and target text prompts) and achieves a lower transport cost than the inversion approach. This leads to state-of-the-art results, as we illustrate with Stable Diffusion 3 and FLUX. Code and examples are available on the project's webpage.
翻译:使用预训练的文本到图像(T2I)扩散/流模型编辑真实图像通常需要将图像反演至其对应的噪声图。然而,反演本身通常不足以获得令人满意的结果,因此许多方法还会在采样过程中进行干预。此类方法虽然能改善效果,但难以在不同模型架构间无缝迁移。本文提出FlowEdit,一种针对预训练T2I流模型的文本编辑方法,该方法无需反演、无需优化且与模型架构无关。我们的方法构建了一个直接映射源分布与目标分布(分别对应源文本提示与目标文本提示)的常微分方程,其传输成本低于反演方法。这在Stable Diffusion 3和FLUX模型上实现了最先进的编辑效果。代码与示例已发布于项目网页。