In the field of image editing, Null-text Inversion (NTI) enables fine-grained editing while preserving the structure of the original image by optimizing null embeddings during the DDIM sampling process. However, the NTI process is time-consuming, taking more than two minutes per image. To address this, we introduce an innovative method that maintains the principles of the NTI while accelerating the image editing process. We propose the WaveOpt-Estimator, which determines the text optimization endpoint based on frequency characteristics. Utilizing wavelet transform analysis to identify the image's frequency characteristics, we can limit text optimization to specific timesteps during the DDIM sampling process. By adopting the Negative-Prompt Inversion (NPI) concept, a target prompt representing the original image serves as the initial text value for optimization. This approach maintains performance comparable to NTI while reducing the average editing time by over 80% compared to the NTI method. Our method presents a promising approach for efficient, high-quality image editing based on diffusion models.
翻译:在图像编辑领域,空文本反演(NTI)通过优化DDIM采样过程中的空嵌入,能够在保持原始图像结构的同时实现细粒度编辑。然而,NTI过程耗时较长,每张图像的处理时间超过两分钟。针对这一问题,我们提出了一种创新方法,在保留NTI原理的同时加速图像编辑流程。我们设计了WaveOpt-Estimator,该估计器基于频率特征确定文本优化终点。通过小波变换分析识别图像的频率特征,可将文本优化限制在DDIM采样过程中的特定时间步。采用负提示反演(NPI)概念,将表征原始图像的目标提示作为优化初始文本值。该方法在保持与NTI相当性能的同时,相比NTI方法平均编辑时间减少80%以上。我们的方法为基于扩散模型的高效高质量图像编辑提供了一种有前景的解决方案。