We introduce ObjectAdd, a training-free diffusion modification method to add user-expected objects into user-specified area. The motive of ObjectAdd stems from: first, describing everything in one prompt can be difficult, and second, users often need to add objects into the generated image. To accommodate with real world, our ObjectAdd maintains accurate image consistency after adding objects with technical innovations in: (1) embedding-level concatenation to ensure correct text embedding coalesce; (2) object-driven layout control with latent and attention injection to ensure objects accessing user-specified area; (3) prompted image inpainting in an attention refocusing & object expansion fashion to ensure rest of the image stays the same. With a text-prompted image, our ObjectAdd allows users to specify a box and an object, and achieves: (1) adding object inside the box area; (2) exact content outside the box area; (3) flawless fusion between the two areas
翻译:我们提出ObjectAdd,一种无需训练即可通过扩散修改方式将用户期望的物体添加到指定区域的方法。ObjectAdd的动机源于:首先,在单一提示词中描述所有内容十分困难;其次,用户常需在已生成的图像中添加物体。为适应实际应用场景,我们的ObjectAdd通过以下技术创新确保添加物体后图像的一致性:(1)嵌入级拼接实现正确的文本嵌入融合;(2)基于物体驱动的布局控制,通过潜在向量与注意力注入确保物体进入用户指定区域;(3)采用注意力重聚焦与物体扩展方式的提示图像修补,确保图像其余部分保持不变。通过文本提示图像,ObjectAdd允许用户指定边界框与待添加物体,并实现:(1)在边界框区域内添加物体;(2)保持边界框区域外的精确内容;(3)两区域间无缝融合。