Effective editing of personal content holds a pivotal role in enabling individuals to express their creativity, weaving captivating narratives within their visual stories, and elevate the overall quality and impact of their visual content. Therefore, in this work, we introduce SwapAnything, a novel framework that can swap any objects in an image with personalized concepts given by the reference, while keeping the context unchanged. Compared with existing methods for personalized subject swapping, SwapAnything has three unique advantages: (1) precise control of arbitrary objects and parts rather than the main subject, (2) more faithful preservation of context pixels, (3) better adaptation of the personalized concept to the image. First, we propose targeted variable swapping to apply region control over latent feature maps and swap masked variables for faithful context preservation and initial semantic concept swapping. Then, we introduce appearance adaptation, to seamlessly adapt the semantic concept into the original image in terms of target location, shape, style, and content during the image generation process. Extensive results on both human and automatic evaluation demonstrate significant improvements of our approach over baseline methods on personalized swapping. Furthermore, SwapAnything shows its precise and faithful swapping abilities across single object, multiple objects, partial object, and cross-domain swapping tasks. SwapAnything also achieves great performance on text-based swapping and tasks beyond swapping such as object insertion.
翻译:有效编辑个人内容在激发个体创造力、编织视觉叙事中的引人入胜情节以及提升视觉内容整体质量与影响力方面具有关键作用。为此,本文提出SwapAnything——一种新颖的框架,能够在保持图像上下文不变的前提下,将图像中的任意对象替换为参考图像所指定的个性化概念。相较于现有的个性化主体替换方法,SwapAnything具备三大独特优势:(1) 可精确控制任意对象及局部区域而非仅限于主体目标,(2) 更忠实地保留上下文像素信息,(3) 使个性化概念更适配目标图像。首先,我们提出针对性变量交换技术,通过对隐特征图实施区域控制并交换掩码变量,实现忠实上下文保持与初始语义概念置换。随后引入外观自适应机制,在图像生成过程中依据目标位置、形状、风格与内容,将语义概念无缝适配至原始图像。大量人工评估与自动评估结果表明,本方法在个性化替换任务上较基线模型有显著提升。此外,SwapAnything在单对象替换、多对象替换、局部对象替换及跨域替换任务中均展现出精准且忠实的替换能力。该框架在基于文本的替换任务以及对象插入等非替换任务中也表现出优异性能。