Large-scale text-to-image models have demonstrated amazing ability to synthesize diverse and high-fidelity images. However, these models are often violated by several limitations. Firstly, they require the user to provide precise and contextually relevant descriptions for the desired image modifications. Secondly, current models can impose significant changes to the original image content during the editing process. In this paper, we explore ReGeneration learning in an image-to-image Diffusion model (ReDiffuser), that preserves the content of the original image without human prompting and the requisite editing direction is automatically discovered within the text embedding space. To ensure consistent preservation of the shape during image editing, we propose cross-attention guidance based on regeneration learning. This novel approach allows for enhanced expression of the target domain features while preserving the original shape of the image. In addition, we introduce a cooperative update strategy, which allows for efficient preservation of the original shape of an image, thereby improving the quality and consistency of shape preservation throughout the editing process. Our proposed method leverages an existing pre-trained text-image diffusion model without any additional training. Extensive experiments show that the proposed method outperforms existing work in both real and synthetic image editing.
翻译:大规模文本到图像模型已展现出合成多样且高保真图像的惊人能力。然而,这些模型常受到若干局限性的制约。首先,它们要求用户为所需图像修改提供精确且上下文相关的描述。其次,当前模型在编辑过程中可能对原始图像内容施加显著改动。本文探索了图像到图像扩散模型中的再生学习(ReDiffuser),该模型无需人工提示即可保留原始图像内容,并且所需的编辑方向能在文本嵌入空间中被自动发现。为确保图像编辑过程中形状的连贯保留,我们提出了基于再生学习的交叉注意力引导方法。这种新颖方法允许在保留原始图像形状的同时增强目标域特征的表达。此外,我们引入了一种协同更新策略,能够高效保留原始图像形状,从而在整个编辑过程中提升形状保留的质量与一致性。所提出的方法无需额外训练,即可利用已有的预训练文本图像扩散模型。大量实验表明,该方法在真实图像和合成图像编辑任务中均优于现有工作。