The goal of image composition is merging a foreground object into a background image to obtain a realistic composite image. Recently, generative composition methods are built on large pretrained diffusion models, due to their unprecedented image generation ability. They train a model on abundant pairs of foregrounds and backgrounds, so that it can be directly applied to a new pair of foreground and background at test time. However, the generated results often lose the foreground details and exhibit noticeable artifacts. In this work, we propose an embarrassingly simple approach named DreamCom inspired by DreamBooth. Specifically, given a few reference images for a subject, we finetune text-guided inpainting diffusion model to associate this subject with a special token and inpaint this subject in the specified bounding box. We also construct a new dataset named MureCom well-tailored for this task.
翻译:图像合成的目标是将前景目标融入背景图像以获得逼真的合成图像。近年来,基于生成式方法的合成模型构建于大规模预训练的扩散模型之上,得益于其前所未有的图像生成能力。这类模型通过在前景与背景的丰富配对数据上训练,使得在测试阶段可直接应用于新的前景与背景组合。然而,生成结果往往损失前景细节并出现明显伪影。本文提出一种名为DreamCom的极简方法,其灵感源于DreamBooth。具体而言,给定某一物体的少量参考图像,我们对文本引导的修复扩散模型进行微调,使该物体与特定标记关联,并在指定边界框内修复该物体。此外,我们构建了名为MureCom的全新数据集,专门适配该任务。