The recent success of text-to-image generation diffusion models has also revolutionized semantic image editing, enabling the manipulation of images based on query/target texts. Despite these advancements, a significant challenge lies in the potential introduction of prior bias in pre-trained models during image editing, e.g., making unexpected modifications to inappropriate regions. To this point, we present a novel Dual-Cycle Diffusion model that addresses the issue of prior bias by generating an unbiased mask as the guidance of image editing. The proposed model incorporates a Bias Elimination Cycle that consists of both a forward path and an inverted path, each featuring a Structural Consistency Cycle to ensure the preservation of image content during the editing process. The forward path utilizes the pre-trained model to produce the edited image, while the inverted path converts the result back to the source image. The unbiased mask is generated by comparing differences between the processed source image and the edited image to ensure that both conform to the same distribution. Our experiments demonstrate the effectiveness of the proposed method, as it significantly improves the D-CLIP score from 0.272 to 0.283. The code will be available at https://github.com/JohnDreamer/DualCycleDiffsion.
翻译:文本到图像生成扩散模型的最新成功也革新了语义图像编辑领域,使得基于查询/目标文本的图像操作成为可能。尽管取得了这些进展,一个重大挑战在于预训练模型在图像编辑过程中可能引入先验偏差,例如对不适当区域进行意外修改。针对这一问题,我们提出了一种新颖的双循环扩散模型,通过生成无偏掩码作为图像编辑的指导来解决先验偏差问题。该模型包含一个由前向路径和反向路径组成的偏差消除循环,每条路径均包含一个结构一致性循环,以确保编辑过程中图像内容的保留。前向路径利用预训练模型生成编辑后的图像,而反向路径则将结果转换回源图像。通过比较处理后的源图像与编辑图像之间的差异生成无偏掩码,确保两者符合同一分布。实验表明,所提方法效果显著,将D-CLIP评分从0.272提升至0.283。代码将发布于https://github.com/JohnDreamer/DualCycleDiffsion。