Current methods commonly utilize three-branch structures of inversion, reconstruction, and editing, to tackle consistent image editing task. However, these methods lack control over the generation position of the edited object and have issues with background preservation. To overcome these limitations, we propose a tuning-free method with only two branches: inversion and editing. This approach allows users to simultaneously edit the object's action and control the generation position of the edited object. Additionally, it achieves improved background preservation. Specifically, we transfer the edited object information to the target area and repair or preserve the background of other areas during the inversion process at a specific time step. In the editing stage, we use the image features in self-attention to query the key and value of the corresponding time step in the inversion to achieve consistent image editing. Impressive image editing results and quantitative evaluation demonstrate the effectiveness of our method. The code is available at https://github.com/mobiushy/move-act.
翻译:当前方法通常采用反演、重建和编辑的三分支结构来处理一致性图像编辑任务。然而,这些方法缺乏对编辑物体生成位置的控制,并存在背景保留问题。为克服这些限制,我们提出一种仅需反演和编辑两分支的无调优方法。该方法允许用户同时编辑物体的动作并控制编辑物体的生成位置,同时实现了更优的背景保留效果。具体而言,我们在特定时间步的反演过程中,将编辑物体的信息转移至目标区域,并修复或保留其他区域的背景。在编辑阶段,我们利用自注意力中的图像特征查询反演过程中对应时间步的键与值,以实现一致性图像编辑。显著的图像编辑结果与定量评估证明了本方法的有效性。代码发布于 https://github.com/mobiushy/move-act。