The goal of image composition is merging a foreground object into a background image to obtain a realistic composite image. Recently, generative composition methods are built on large pretrained diffusion models, due to their unprecedented image generation ability. However, they are weak in preserving the foreground object details. Inspired by recent text-to-image generation customized for certain object, we propose DreamCom by treating image composition as text-guided image inpainting customized for certain object. Specifically , we finetune pretrained text-guided image inpainting model based on a few reference images containing the same object, during which the text prompt contains a special token associated with this object. Then, given a new background, we can insert this object into the background with the text prompt containing the special token. In practice, the inserted object may be adversely affected by the background, so we propose masked attention mechanisms to avoid negative background interference. Experimental results on DreamEditBench and our contributed MureCom dataset show the outstanding performance of our DreamCom.
翻译:图像合成的目标是将前景对象融合到背景图像中,以获得逼真的合成图像。近年来,生成式合成方法基于大型预训练扩散模型构建,因其前所未有的图像生成能力而备受关注。然而,这些方法在保留前景对象细节方面仍有不足。受近期针对特定对象的文本生成图像技术启发,我们提出DreamCom,将图像合成视为针对特定对象的文本引导图像修复问题。具体而言,我们基于包含同一对象的少量参考图像,对预训练的文本引导图像修复模型进行微调,其中文本提示包含与该对象关联的特殊标记。随后,给定新背景时,我们可通过包含特殊标记的文本提示将该对象插入背景中。实际应用中,插入对象可能受到背景的不利影响,因此我们提出掩蔽注意力机制以避免背景干扰。在DreamEditBench及我们贡献的MureCom数据集上的实验结果表明,DreamCom具有卓越性能。