With the great success of text-conditioned diffusion models in creative text-to-image generation, various text-driven image editing approaches have attracted the attentions of many researchers. However, previous works mainly focus on discreteness-sensitive instructions such as adding, removing or replacing specific objects, background elements or global styles (i.e., hard editing), while generally ignoring subject-binding but semantically fine-changing continuity-sensitive instructions such as actions, poses or adjectives, and so on (i.e., soft editing), which hampers generative AI from generating user-customized visual contents. To mitigate this predicament, we propose a spatio-temporal guided adaptive editing algorithm AdapEdit, which realizes adaptive image editing by introducing a soft-attention strategy to dynamically vary the guiding degree from the editing conditions to visual pixels from both temporal and spatial perspectives. Note our approach has a significant advantage in preserving model priors and does not require model training, fine-tuning, extra data, or optimization. We present our results over a wide variety of raw images and editing instructions, demonstrating competitive performance and showing it significantly outperforms the previous approaches.
翻译:随着文本条件扩散模型在创意文本到图像生成领域的巨大成功,各种基于文本的图像编辑方法引起了众多研究者的关注。然而,现有工作主要聚焦于离散性敏感指令,例如添加、移除或替换特定物体、背景元素或全局风格(即硬编辑),而通常忽略了诸如动作、姿态或形容词等涉及主体绑定但语义精细变化的连续性敏感指令(即软编辑),这阻碍了生成式AI生成用户定制化的视觉内容。为解决这一难题,我们提出了一种基于时空引导的自适应编辑算法AdapEdit,该算法通过引入软注意力策略,从时间和空间两个维度动态调整编辑条件对视觉像素的引导程度,从而实现自适应图像编辑。值得注意的是,我们的方法在保留模型先验知识方面具有显著优势,且无需模型训练、微调、额外数据或优化。我们在多种原始图像和编辑指令上展示了实验结果,证明了该算法的竞争性表现,并表明其显著优于以往方法。