Text-guided image editing finds applications in various creative and practical fields. While recent studies in image generation have advanced the field, they often struggle with the dual challenges of coherent image transformation and context preservation. In response, our work introduces prompt augmentation, a method amplifying a single input prompt into several target prompts, strengthening textual context and enabling localised image editing. Specifically, we use the augmented prompts to delineate the intended manipulation area. We propose a Contrastive Loss tailored to driving effective image editing by displacing edited areas and drawing preserved regions closer. Acknowledging the continuous nature of image manipulations, we further refine our approach by incorporating the similarity concept, creating a Soft Contrastive Loss. The new losses are incorporated to the diffusion model, demonstrating improved or competitive image editing results on public datasets and generated images over state-of-the-art approaches.
翻译:文本引导图像编辑在创意与实用领域均有广泛应用。尽管近期图像生成研究推动了该领域发展,但现有方法常难以同时应对图像连贯变换与上下文保持的双重挑战。为此,本研究提出提示增强方法,将单个输入提示扩增为多个目标提示,通过强化文本上下文实现局部化图像编辑。具体而言,我们利用增强提示划定目标编辑区域,并提出专为驱动有效图像编辑设计的对比损失函数,该函数通过分离编辑区域并拉近保留区域实现优化。考虑到图像编辑的连续性特征,我们进一步引入相似度概念构建软对比损失函数以改进方法。新损失函数被整合至扩散模型中,在公开数据集与生成图像上的实验表明,本方法相较于前沿技术取得了更优或具有竞争力的图像编辑效果。