Recent advances in text-guided image editing enable users to perform image edits through simple text inputs, leveraging the extensive priors of multi-step diffusion-based text-to-image models. However, these methods often fall short of the speed demands required for real-world and on-device applications due to the costly multi-step inversion and sampling process involved. In response to this, we introduce SwiftEdit, a simple yet highly efficient editing tool that achieve instant text-guided image editing (in 0.23s). The advancement of SwiftEdit lies in its two novel contributions: a one-step inversion framework that enables one-step image reconstruction via inversion and a mask-guided editing technique with our proposed attention rescaling mechanism to perform localized image editing. Extensive experiments are provided to demonstrate the effectiveness and efficiency of SwiftEdit. In particular, SwiftEdit enables instant text-guided image editing, which is extremely faster than previous multi-step methods (at least 50 times faster) while maintain a competitive performance in editing results. Our project page is at: https://swift-edit.github.io/
翻译:近年来,文本引导图像编辑技术的进步使得用户能够通过简单的文本输入进行图像编辑,这主要利用了基于多步扩散的文本到图像模型所具备的广泛先验知识。然而,由于涉及计算成本高昂的多步反演和采样过程,这些方法往往难以满足现实世界和设备端应用对速度的要求。为此,我们提出了SwiftEdit,这是一种简单而高效的编辑工具,能够实现即时的文本引导图像编辑(耗时0.23秒)。SwiftEdit的先进性在于其两项新颖的贡献:一个通过反演实现单步图像重建的单步反演框架,以及一种结合了我们提出的注意力重缩放机制的掩码引导编辑技术,用于执行局部图像编辑。我们提供了广泛的实验来证明SwiftEdit的有效性和效率。特别地,SwiftEdit实现了即时文本引导图像编辑,其速度远超以往的多步方法(至少快50倍),同时在编辑结果上保持了有竞争力的性能。我们的项目页面位于:https://swift-edit.github.io/