Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning to reverse the process of gradually adding noise to images, allowing them to generate high-quality samples from a complex distribution. In this survey, we provide an exhaustive overview of existing methods using diffusion models for image editing, covering both theoretical and practical aspects in the field. We delve into a thorough analysis and categorization of these works from multiple perspectives, including learning strategies, user-input conditions, and the array of specific editing tasks that can be accomplished. In addition, we pay special attention to image inpainting and outpainting, and explore both earlier traditional context-driven and current multimodal conditional methods, offering a comprehensive analysis of their methodologies. To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval, featuring an innovative metric, LMM Score. Finally, we address current limitations and envision some potential directions for future research. The accompanying repository is released at https://github.com/SiatMMLab/Awesome-Diffusion-Model-Based-Image-Editing-Methods.
翻译:去噪扩散模型已成为多种图像生成和编辑任务的有力工具,能够以无条件或输入条件的方式合成视觉内容。其核心思想是学习逆转向图像逐步添加噪声的过程,从而从复杂分布中生成高质量样本。本综述全面概述了现有基于扩散模型的图像编辑方法,覆盖了该领域的理论和实践方面。我们从多个角度对这些方法进行了深入分析和分类,包括学习策略、用户输入条件以及可完成的各种具体编辑任务。此外,我们特别关注图像修复和外延修复,并探讨了早期的传统上下文驱动方法和当前的多模态条件方法,对其方法论进行了全面分析。为进一步评估文本引导图像编辑算法的性能,我们提出了一个系统性基准EditEval,并引入了一种创新性指标LMM Score。最后,我们讨论了当前方法的局限性,并展望了未来研究的一些潜在方向。相关资源库已发布于https://github.com/SiatMMLab/Awesome-Diffusion-Model-Based-Image-Editing-Methods。