Despite the ability of existing large-scale text-to-image (T2I) models to generate high-quality images from detailed textual descriptions, they often lack the ability to precisely edit the generated or real images. In this paper, we propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models. Specifically, we construct classifier guidance based on the strong correspondence of intermediate features in the diffusion model. It can transform the editing signals into gradients via feature correspondence loss to modify the intermediate representation of the diffusion model. Based on this guidance strategy, we also build a multi-scale guidance to consider both semantic and geometric alignment. Moreover, a cross-branch self-attention is added to maintain the consistency between the original image and the editing result. Our method, through an efficient design, achieves various editing modes for the generated or real images, such as object moving, object resizing, object appearance replacement, and content dragging. It is worth noting that all editing and content preservation signals come from the image itself, and the model does not require fine-tuning or additional modules. Our source code will be available at https://github.com/MC-E/DragonDiffusion.
翻译:尽管现有大规模文本到图像(T2I)模型具备从详细文本描述生成高质量图像的能力,但它们往往缺乏对生成图像或真实图像进行精确编辑的能力。本文提出一种新颖的图像编辑方法DragonDiffusion,能够在扩散模型上实现拖拽式操控。具体而言,我们基于扩散模型中中间特征的强对应性构建分类器引导,通过特征对应损失将编辑信号转化为梯度,从而修改扩散模型的中间表示。基于该引导策略,我们还构建了多尺度引导以兼顾语义对齐与几何对齐。此外,引入跨分支自注意力机制以保持原始图像与编辑结果之间的一致性。通过高效设计,我们的方法能够实现对生成图像或真实图像的多种编辑模式,包括物体移动、物体缩放、物体外观替换以及内容拖拽。值得强调的是,所有编辑与内容保留信号均源自图像本身,且模型无需微调或额外模块。我们的源代码将发布于https://github.com/MC-E/DragonDiffusion。