In the field of image editing, three core challenges persist: controllability, background preservation, and efficiency. Inversion-based methods rely on time-consuming optimization to preserve the features of the initial images, which results in low efficiency due to the requirement for extensive network inference. Conversely, inversion-free methods lack theoretical support for background similarity, as they circumvent the issue of maintaining initial features to achieve efficiency. As a consequence, none of these methods can achieve both high efficiency and background consistency. To tackle the challenges and the aforementioned disadvantages, we introduce PostEdit, a method that incorporates a posterior scheme to govern the diffusion sampling process. Specifically, a corresponding measurement term related to both the initial features and Langevin dynamics is introduced to optimize the estimated image generated by the given target prompt. Extensive experimental results indicate that the proposed PostEdit achieves state-of-the-art editing performance while accurately preserving unedited regions. Furthermore, the method is both inversion- and training-free, necessitating approximately 1.5 seconds and 18 GB of GPU memory to generate high-quality results.
翻译:在图像编辑领域,三个核心挑战持续存在:可控性、背景保持与效率。基于反转的方法依赖耗时的优化来保留初始图像的特征,这因需要大量网络推断而导致效率低下。相反,免反转方法由于绕过了保持初始特征的问题以实现效率,缺乏关于背景相似性的理论支持。因此,现有方法均无法同时实现高效率与背景一致性。为应对这些挑战及前述不足,我们提出了PostEdit,一种引入后验方案以控制扩散采样过程的方法。具体而言,我们引入了一个与初始特征及朗之万动力学均相关的对应测量项,以优化由给定目标提示词生成的估计图像。大量实验结果表明,所提出的PostEdit在准确保持未编辑区域的同时,实现了最先进的编辑性能。此外,该方法无需反转与额外训练,仅需约1.5秒和18 GB的GPU内存即可生成高质量结果。