LIME: Localized Image Editing via Attention Regularization in Diffusion Models

Diffusion models (DMs) have gained prominence due to their ability to generate high-quality varied images with recent advancements in text-to-image generation. The research focus is now shifting towards the controllability of DMs. A significant challenge within this domain is localized editing, where specific areas of an image are modified without affecting the rest of the content. This paper introduces LIME for localized image editing in diffusion models. LIME does not require user-specified regions of interest (RoI) or additional text input, but rather employs features from pre-trained methods and a straightforward clustering method to obtain precise editing mask. Then, by leveraging cross-attention maps, it refines these segments for finding regions to obtain localized edits. Finally, we propose a novel cross-attention regularization technique that penalizes unrelated cross-attention scores in the RoI during the denoising steps, ensuring localized edits. Our approach, without re-training, fine-tuning and additional user inputs, consistently improves the performance of existing methods in various editing benchmarks. The project page can be found at https://enisimsar.github.io/LIME/.

翻译：扩散模型（DMs）因其在文本到图像生成领域的最新进展而备受关注，能够生成高质量且多样化的图像。当前研究重点正转向扩散模型的可控性。该领域的一个重要挑战是局部编辑，即在修改图像特定区域的同时不影响其余内容。本文提出LIME，用于在扩散模型中实现局部图像编辑。LIME无需用户指定感兴趣区域（RoI）或额外文本输入，而是利用预训练方法提取的特征和简单的聚类方法获取精确的编辑掩码。随后，通过利用交叉注意力图，该方法优化这些分割区域以定位待编辑区域。最后，我们提出一种新颖的交叉注意力正则化技术，在去噪步骤中对RoI内不相关的交叉注意力分数进行惩罚，从而确保编辑的局部性。我们的方法无需重新训练、微调或额外用户输入，在多种编辑基准测试中持续提升了现有方法的性能。项目页面详见 https://enisimsar.github.io/LIME/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日