Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Adding Object into images based on text instructions is a challenging task in semantic image editing, requiring a balance between preserving the original scene and seamlessly integrating the new object in a fitting location. Despite extensive efforts, existing models often struggle with this balance, particularly with finding a natural location for adding an object in complex scenes. We introduce Add-it, a training-free approach that extends diffusion models' attention mechanisms to incorporate information from three key sources: the scene image, the text prompt, and the generated image itself. Our weighted extended-attention mechanism maintains structural consistency and fine details while ensuring natural object placement. Without task-specific fine-tuning, Add-it achieves state-of-the-art results on both real and generated image insertion benchmarks, including our newly constructed "Additing Affordance Benchmark" for evaluating object placement plausibility, outperforming supervised methods. Human evaluations show that Add-it is preferred in over 80% of cases, and it also demonstrates improvements in various automated metrics.

翻译：基于文本指令在图像中添加物体是语义图像编辑中的一项挑战性任务，需要在保持原始场景与将新物体无缝融入合适位置之间取得平衡。尽管已有大量研究，现有模型仍难以实现这种平衡，尤其是在复杂场景中为添加物体寻找自然位置方面。本文提出Add-it，一种免训练方法，通过扩展扩散模型的注意力机制以整合来自三个关键来源的信息：场景图像、文本提示及生成图像本身。我们的加权扩展注意力机制在确保物体自然放置的同时，保持了结构一致性与细节精度。无需任务特定微调，Add-it在真实图像与生成图像的物体插入基准测试中均取得了最先进的结果，包括我们新构建的用于评估物体放置合理性的“Additing Affordance Benchmark”，其性能甚至超越了有监督方法。人工评估表明Add-it在超过80%的案例中更受青睐，同时在各项自动化指标上也展现出显著提升。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日