Current image editing methods primarily utilize DDIM Inversion, employing a two-branch diffusion approach to preserve the attributes and layout of the original image. However, these methods encounter challenges with non-rigid edits, which involve altering the image's layout or structure. Our comprehensive analysis reveals that the high-frequency components of DDIM latent, crucial for retaining the original image's key features and layout, significantly contribute to these limitations. Addressing this, we introduce FlexiEdit, which enhances fidelity to input text prompts by refining DDIM latent, by reducing high-frequency components in targeted editing areas. FlexiEdit comprises two key components: (1) Latent Refinement, which modifies DDIM latent to better accommodate layout adjustments, and (2) Edit Fidelity Enhancement via Re-inversion, aimed at ensuring the edits more accurately reflect the input text prompts. Our approach represents notable progress in image editing, particularly in performing complex non-rigid edits, showcasing its enhanced capability through comparative experiments.
翻译:当前图像编辑方法主要采用DDIM反演技术,通过双分支扩散路径以保持原始图像的属性与布局。然而,这些方法在处理涉及图像布局或结构改变的非刚性编辑任务时面临挑战。我们的综合分析表明,DDIM潜在空间中高频分量对保留原始图像关键特征与布局具有重要作用,而这正是导致现有方法局限性的关键因素。为此,我们提出FlexiEdit方法,通过针对性地降低编辑区域的高频分量来优化DDIM潜在表示,从而提升对输入文本提示的忠实度。FlexiEdit包含两个核心组件:(1)潜在空间优化模块,通过调整DDIM潜在表示以更好地适应布局变更;(2)基于再反演的编辑保真度增强机制,旨在确保编辑结果更精确地反映输入文本提示。本方法在图像编辑领域取得了显著进展,尤其在执行复杂非刚性编辑任务方面,通过对比实验验证了其增强的编辑能力。