The recent success of text-to-image generation diffusion models has also revolutionized semantic image editing, enabling the manipulation of images based on query/target texts. Despite these advancements, a significant challenge lies in the potential introduction of contextual prior bias in pre-trained models during image editing, e.g., making unexpected modifications to inappropriate regions. To address this issue, we present a novel approach called Dual-Cycle Diffusion, which generates an unbiased mask to guide image editing. The proposed model incorporates a Bias Elimination Cycle that consists of both a forward path and an inverted path, each featuring a Structural Consistency Cycle to ensure the preservation of image content during the editing process. The forward path utilizes the pre-trained model to produce the edited image, while the inverted path converts the result back to the source image. The unbiased mask is generated by comparing differences between the processed source image and the edited image to ensure that both conform to the same distribution. Our experiments demonstrate the effectiveness of the proposed method, as it significantly improves the D-CLIP score from 0.272 to 0.283. The code will be available at https://github.com/JohnDreamer/DualCycleDiffsion.
翻译:文本到图像生成扩散模型的最新成功也彻底改变了语义图像编辑领域,使得基于查询/目标文本的图像操作成为可能。尽管取得了这些进展,但在图像编辑过程中,预训练模型可能引入上下文先验偏差,例如对不相关区域进行意外修改,这成为一项重大挑战。为了解决这一问题,我们提出了一种名为双循环扩散的新方法,该方法生成无偏掩码以引导图像编辑。所提出的模型包含一个偏差消除循环,该循环由前向路径和反向路径组成,每条路径均包含一个结构一致性循环,以确保编辑过程中图像内容的保留。前向路径利用预训练模型生成编辑后的图像,而反向路径则将结果转换回源图像。通过比较处理后的源图像与编辑图像之间的差异生成无偏掩码,确保两者符合相同分布。实验结果表明,该方法有效将D-CLIP分数从0.272提升至0.283。代码将在https://github.com/JohnDreamer/DualCycleDiffsion 开源。