ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation

Large-scale text-to-image models have demonstrated amazing ability to synthesize diverse and high-fidelity images. However, these models are often violated by several limitations. Firstly, they require the user to provide precise and contextually relevant descriptions for the desired image modifications. Secondly, current models can impose significant changes to the original image content during the editing process. In this paper, we explore ReGeneration learning in an image-to-image Diffusion model (ReDiffuser), that preserves the content of the original image without human prompting and the requisite editing direction is automatically discovered within the text embedding space. To ensure consistent preservation of the shape during image editing, we propose cross-attention guidance based on regeneration learning. This novel approach allows for enhanced expression of the target domain features while preserving the original shape of the image. In addition, we introduce a cooperative update strategy, which allows for efficient preservation of the original shape of an image, thereby improving the quality and consistency of shape preservation throughout the editing process. Our proposed method leverages an existing pre-trained text-image diffusion model without any additional training. Extensive experiments show that the proposed method outperforms existing work in both real and synthetic image editing.

翻译：大规模文本到图像模型已展现出合成多样且高保真图像的惊人能力。然而，这些模型常受到若干局限性的制约。首先，它们要求用户为所需图像修改提供精确且上下文相关的描述。其次，当前模型在编辑过程中可能对原始图像内容施加显著改动。本文探索了图像到图像扩散模型中的再生学习（ReDiffuser），该模型无需人工提示即可保留原始图像内容，并且所需的编辑方向能在文本嵌入空间中被自动发现。为确保图像编辑过程中形状的连贯保留，我们提出了基于再生学习的交叉注意力引导方法。这种新颖方法允许在保留原始图像形状的同时增强目标域特征的表达。此外，我们引入了一种协同更新策略，能够高效保留原始图像形状，从而在整个编辑过程中提升形状保留的质量与一致性。所提出的方法无需额外训练，即可利用已有的预训练文本图像扩散模型。大量实验表明，该方法在真实图像和合成图像编辑任务中均优于现有工作。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日