Text-conditioned image editing has emerged as a powerful tool for editing images. However, in many situations, language can be ambiguous and ineffective in describing specific image edits. When faced with such challenges, visual prompts can be a more informative and intuitive way to convey ideas. We present a method for image editing via visual prompting. Given pairs of example that represent the "before" and "after" images of an edit, our goal is to learn a text-based editing direction that can be used to perform the same edit on new images. We leverage the rich, pretrained editing capabilities of text-to-image diffusion models by inverting visual prompts into editing instructions. Our results show that with just one example pair, we can achieve competitive results compared to state-of-the-art text-conditioned image editing frameworks.
翻译:文本条件图像编辑已成为一种强大的图像编辑工具。然而,在许多情况下,语言在描述特定图像编辑时可能模糊不清且效果不佳。面对此类挑战,视觉提示能提供更丰富且直观的思路来传达意图。我们提出了一种基于视觉提示的图像编辑方法。给定代表编辑"前"与"后"图像的示例对,我们的目标是学习一种基于文本的编辑方向,该方向可用于对新图像执行相同的编辑操作。我们通过将视觉提示逆推为编辑指令,充分利用了文本到图像扩散模型预训练的丰富编辑能力。结果表明,仅需一对示例,我们就能实现与最先进的文本条件图像编辑框架相媲美的结果。