Denoising diffusion models have recently gained prominence as powerful tools for a variety of image generation and manipulation tasks. Building on this, we propose a novel tool for real-time editing of images that provides users with fine-grained region-targeted supervision in addition to existing prompt-based controls. Our novel editing technique, termed Layered Diffusion Brushes, leverages prompt-guided and region-targeted alteration of intermediate denoising steps, enabling precise modifications while maintaining the integrity and context of the input image. We provide an editor based on Layered Diffusion Brushes modifications, which incorporates well-known image editing concepts such as layer masks, visibility toggles, and independent manipulation of layers; regardless of their order. Our system renders a single edit on a 512x512 image within 140 ms using a high-end consumer GPU, enabling real-time feedback and rapid exploration of candidate edits. We validated our method and editing system through a user study involving both natural images (using inversion) and generated images, showcasing its usability and effectiveness compared to existing techniques such as InstructPix2Pix and Stable Diffusion Inpainting for refining images. Our approach demonstrates efficacy across a range of tasks, including object attribute adjustments, error correction, and sequential prompt-based object placement and manipulation, demonstrating its versatility and potential for enhancing creative workflows.
翻译:去噪扩散模型近来已显著成为多种图像生成与操作任务的有力工具。在此基础上,我们提出了一种用于实时图像编辑的新颖工具,除了现有的基于提示的控制外,还为用户提供了细粒度的区域目标监督。我们的新型编辑技术,即分层扩散笔刷,利用提示引导和区域目标对中间去噪步骤进行修改,从而实现精确调整,同时保持输入图像的完整性和上下文。我们提供了一个基于分层扩散笔刷修改的编辑器,它融合了诸如图层蒙版、可见性切换以及图层独立操作(无论图层顺序如何)等众所周知的图像编辑概念。我们的系统使用高端消费级GPU在140毫秒内渲染一次针对512x512图像的编辑,从而支持实时反馈和候选编辑的快速探索。我们通过一项涉及自然图像(使用反转)和生成图像的用户研究验证了我们的方法和编辑系统,展示了其相较于现有技术(如用于图像优化的InstructPix2Pix和Stable Diffusion Inpainting)的可用性和有效性。我们的方法在一系列任务中展现出潜力,包括对象属性调整、错误纠正以及基于顺序提示的对象放置与操作,体现了其在增强创意工作流程方面的多功能性和潜力。