Image manipulation under the guidance of textual descriptions has recently received a broad range of attention. In this study, we focus on the regional editing of images with the guidance of given text prompts. Different from current mask-based image editing methods, we propose a novel region-aware diffusion model (RDM) for entity-level image editing, which could automatically locate the region of interest and replace it following given text prompts. To strike a balance between image fidelity and inference speed, we design the intensive diffusion pipeline by combing latent space diffusion and enhanced directional guidance. In addition, to preserve image content in non-edited regions, we introduce regional-aware entity editing to modify the region of interest and preserve the out-of-interest region. We validate the proposed RDM beyond the baseline methods through extensive qualitative and quantitative experiments. The results show that RDM outperforms the previous approaches in terms of visual quality, overall harmonization, non-editing region content preservation, and text-image semantic consistency. The codes are available at https://github.com/haha-lisa/RDM-Region-Aware-Diffusion-Model.
翻译:近年来,在文本描述指导下进行图像操作引起了广泛关注。本研究聚焦于在给定文本提示引导下对图像进行区域编辑。与当前基于掩码的图像编辑方法不同,我们提出了一种新颖的区域感知扩散模型(RDM)用于实体级图像编辑,该模型能自动定位感兴趣区域并依据给定文本提示进行替换。为平衡图像保真度与推理速度,我们设计了结合隐空间扩散与增强方向引导的密集扩散管道。此外,为保留非编辑区域的图像内容,我们引入了区域感知实体编辑机制,仅修改感兴趣区域并保持其余区域不变。通过大量定性与定量实验,我们验证了所提RDM相较于基线方法的优越性。结果显示,RDM在视觉质量、整体协调性、非编辑区域内容保留及文本-图像语义一致性方面均优于先前方法。相关代码已开源至https://github.com/haha-lisa/RDM-Region-Aware-Diffusion-Model。