Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models

The rapid advancement of pretrained text-driven diffusion models has significantly enriched applications in image generation and editing. However, as the demand for personalized content editing increases, new challenges emerge especially when dealing with arbitrary objects and complex scenes. Existing methods usually mistakes mask as the object shape prior, which struggle to achieve a seamless integration result. The mostly used inversion noise initialization also hinders the identity consistency towards the target object. To address these challenges, we propose a novel training-free framework that formulates personalized content editing as the optimization of edited images in the latent space, using diffusion models as the energy function guidance conditioned by reference text-image pairs. A coarse-to-fine strategy is proposed that employs text energy guidance at the early stage to achieve a natural transition toward the target class and uses point-to-point feature-level image energy guidance to perform fine-grained appearance alignment with the target object. Additionally, we introduce the latent space content composition to enhance overall identity consistency with the target. Extensive experiments demonstrate that our method excels in object replacement even with a large domain gap, highlighting its potential for high-quality, personalized image editing.

翻译：预训练文本驱动扩散模型的快速发展极大地丰富了图像生成与编辑的应用。然而，随着对个性化内容编辑需求的增长，尤其是在处理任意对象和复杂场景时，新的挑战随之出现。现有方法通常将掩码误认为对象形状先验，难以实现无缝的融合结果。最常用的反转噪声初始化也阻碍了与目标对象身份一致性的保持。为应对这些挑战，我们提出了一种新颖的无训练框架，该框架将个性化内容编辑表述为在潜在空间中对待编辑图像的优化，并以扩散模型作为由参考文本-图像对条件化的能量函数引导。我们提出了一种由粗到精的策略：在早期阶段采用文本能量引导以实现向目标类别的自然过渡，并使用点对点的特征级图像能量引导来执行与目标对象的细粒度外观对齐。此外，我们引入了潜在空间内容组合以增强与目标的整体身份一致性。大量实验表明，即使存在较大的领域差距，我们的方法在对象替换任务上依然表现出色，突显了其在高质量、个性化图像编辑方面的潜力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日