We present Magic Insert, a method for dragging-and-dropping subjects from a user-provided image into a target image of a different style in a physically plausible manner while matching the style of the target image. This work formalizes the problem of style-aware drag-and-drop and presents a method for tackling it by addressing two sub-problems: style-aware personalization and realistic object insertion in stylized images. For style-aware personalization, our method first fine-tunes a pretrained text-to-image diffusion model using LoRA and learned text tokens on the subject image, and then infuses it with a CLIP representation of the target style. For object insertion, we use Bootstrapped Domain Adaption to adapt a domain-specific photorealistic object insertion model to the domain of diverse artistic styles. Overall, the method significantly outperforms traditional approaches such as inpainting. Finally, we present a dataset, SubjectPlop, to facilitate evaluation and future progress in this area. Project page: https://magicinsert.github.io/
翻译:我们提出魔法插入(Magic Insert)方法,该方法能够以物理合理的方式将用户提供图像中的主体拖放至不同风格的目标图像中,同时使主体与目标图像的风格相匹配。本研究将风格感知拖放问题形式化,并通过解决两个子问题——风格感知个性化与风格化图像中的真实对象插入——提出了应对该问题的方法。对于风格感知个性化,我们的方法首先使用LoRA和从主体图像学习到的文本标记对预训练的文本到图像扩散模型进行微调,然后将其与目标风格的CLIP表示相融合。对于对象插入,我们采用自举域适应方法,将特定领域的逼真对象插入模型适配到多样化艺术风格的领域。总体而言,该方法显著优于传统方法(如图像修复)。最后,我们提出了一个数据集SubjectPlop,以促进该领域的评估和未来进展。项目页面:https://magicinsert.github.io/