Editing natural images using textual descriptions in text-to-image diffusion models remains a significant challenge, particularly in achieving consistent generation and handling complex, non-rigid objects. Existing methods often struggle to preserve textures and identity, require extensive fine-tuning, and exhibit limitations in editing specific spatial regions or objects while retaining background details. This paper proposes Context-Preserving Adaptive Manipulation (CPAM), a novel zero-shot framework for complicated, non-rigid real image editing. Specifically, we propose a preservation adaptation module that adjusts self-attention mechanisms to preserve and independently control the object and background effectively. This ensures that the objects' shapes, textures, and identities are maintained while keeping the background undistorted during the editing process using the mask guidance technique. Additionally, we develop a localized extraction module to mitigate the interference with the non-desired modified regions during conditioning in cross-attention mechanisms. We also introduce various mask-guidance strategies to facilitate diverse image manipulation tasks in a simple manner. CPAM can be seamlessly integrated with multiple diffusion backbones, including SD1.5, SD2.1, and SDXL, demonstrating strong generalization across different model architectures. Extensive experiments on our newly constructed Image Manipulation BenchmArk (IMBA), a robust benchmark dataset specifically designed for real image editing, demonstrate that our proposed method is the preferred choice among human raters, outperforming existing state-of-the-art editing techniques. The source code and data will be publicly released at the project page: https://vdkhoi20.github.io/CPAM
翻译:利用文本到图像扩散模型中的文本描述编辑自然图像仍是一项重大挑战,尤其是在实现一致性生成和处理复杂非刚性物体方面。现有方法往往难以保持纹理和身份特征,需要大量微调,且在编辑特定空间区域或物体时难以保留背景细节。本文提出上下文保持自适应操作(CPAM),一种针对复杂非刚性真实图像编辑的新型零样本框架。具体而言,我们提出保持适配模块,通过调整自注意力机制有效保持并独立控制物体与背景。该模块利用掩码引导技术,确保编辑过程中物体的形状、纹理和身份保持不变,同时背景不被扭曲。此外,我们开发局部提取模块,以减轻跨注意力机制中条件作用对非目标修改区域的干扰。我们还引入多种掩码引导策略,以简便方式实现多样化图像操作任务。CPAM可无缝集成至多种扩散架构(包括SD1.5、SD2.1和SDXL),展现出跨不同模型架构的强泛化能力。在专为真实图像编辑构建的鲁棒基准数据集Image Manipulation BenchmArk(IMBA)上进行的大量实验表明,所提方法是人类评分者的优选,并超越了现有最先进的编辑技术。源代码和数据将在项目页面公开:https://vdkhoi20.github.io/CPAM