Diffusion models (DMs) have gained prominence due to their ability to generate high-quality, varied images, with recent advancements in text-to-image generation. The research focus is now shifting towards the controllability of DMs. A significant challenge within this domain is localized editing, where specific areas of an image are modified without affecting the rest of the content. This paper introduces LIME for localized image editing in diffusion models that do not require user-specified regions of interest (RoI) or additional text input. Our method employs features from pre-trained methods and a simple clustering technique to obtain precise semantic segmentation maps. Then, by leveraging cross-attention maps, it refines these segments for localized edits. Finally, we propose a novel cross-attention regularization technique that penalizes unrelated cross-attention scores in the RoI during the denoising steps, ensuring localized edits. Our approach, without re-training and fine-tuning, consistently improves the performance of existing methods in various editing benchmarks.
翻译:扩散模型(DMs)因能生成高质量、多样化的图像而备受关注,尤其在文本到图像生成领域的最新进展中。研究焦点正转向DMs的可控性。该领域面临的一个重大挑战是局部编辑,即在保持图像其余内容不变的情况下修改特定区域。本文提出LIME方法,用于无需用户指定感兴趣区域(RoI)或额外文本输入的扩散模型局部图像编辑。我们的方法利用预训练模型的特征与简单聚类技术获取精确的语义分割图,进而通过交叉注意力图细化这些分割区域以实现局部编辑。最后,我们提出一种新颖的交叉注意力正则化技术,在去噪步骤中惩罚RoI内无关的交叉注意力得分,确保编辑的局部性。该方法无需重新训练或微调,即在多种编辑基准测试中显著提升了现有方法的性能。