DDIM inversion has revealed the remarkable potential of real image editing within diffusion-based methods. However, the accuracy of DDIM reconstruction degrades as larger classifier-free guidance (CFG) scales being used for enhanced editing. Null-text inversion (NTI) optimizes null embeddings to align the reconstruction and inversion trajectories with larger CFG scales, enabling real image editing with cross-attention control. Negative-prompt inversion (NPI) further offers a training-free closed-form solution of NTI. However, it may introduce artifacts and is still constrained by DDIM reconstruction quality. To overcome these limitations, we propose proximal guidance and incorporate it to NPI with cross-attention control. We enhance NPI with a regularization term and reconstruction guidance, which reduces artifacts while capitalizing on its training-free nature. Additionally, we extend the concepts to incorporate mutual self-attention control, enabling geometry and layout alterations in the editing process. Our method provides an efficient and straightforward approach, effectively addressing real image editing tasks with minimal computational overhead.
翻译:DDIM反转已揭示了基于扩散方法在真实图像编辑中的显著潜力。然而,当使用更大无分类器引导(CFG)尺度以增强编辑效果时,DDIM重构的准确性会下降。空文本反转(NTI)通过优化空嵌入,使重构和反转轨迹与更大CFG尺度对齐,从而能够通过交叉注意力控制实现真实图像编辑。负提示反转(NPI)进一步提供了NTI的无训练封闭解,但其可能引入伪影,且仍受限于DDIM重构质量。为克服这些限制,我们提出近端引导,并将其与交叉注意力控制结合融入NPI。我们通过正则化项和重构引导增强NPI,在保留其无训练性质的同时减少伪影。此外,我们将这些概念扩展到互自注意力控制,实现编辑过程中几何与布局的更改。我们的方法提供了一种高效且直观的方式,以极小的计算开销有效处理真实图像编辑任务。