Diffusion models have demonstrated outstanding performance in generative tasks, making them ideal candidates for image editing. Recent studies highlight their ability to apply desired edits effectively by following textual instructions, yet with two key challenges remaining. First, these models struggle to apply multiple edits simultaneously, resulting in computational inefficiencies due to their reliance on sequential processing. Second, relying on textual prompts to determine the editing region can lead to unintended alterations to the image. We introduce FunEditor, an efficient diffusion model designed to learn atomic editing functions and perform complex edits by aggregating simpler functions. This approach enables complex editing tasks, such as object movement, by aggregating multiple functions and applying them simultaneously to specific areas. Our experiments demonstrate that FunEditor significantly outperforms recent inference-time optimization methods and fine-tuned models, either quantitatively across various metrics or through visual comparisons or both, on complex tasks like object movement and object pasting. In the meantime, with only 4 steps of inference, FunEditor achieves 5-24x inference speedups over existing popular methods. The code is available at: mhmdsmdi.github.io/funeditor/.
翻译:扩散模型在生成任务中展现出卓越性能,使其成为图像编辑的理想选择。近期研究强调了其通过遵循文本指令有效应用期望编辑的能力,但仍存在两个关键挑战。首先,这些模型难以同时应用多重编辑,因其依赖顺序处理而导致计算效率低下。其次,依赖文本提示确定编辑区域可能导致对图像的意外修改。我们提出FunEditor,这是一种高效的扩散模型,旨在学习原子编辑函数并通过聚合简单函数来执行复杂编辑。该方法通过聚合多个函数并将其同时应用于特定区域,实现了如物体移动等复杂编辑任务。我们的实验表明,在物体移动和物体粘贴等复杂任务上,FunEditor在多项指标定量评估或视觉比较或两者兼具的评估中,均显著优于近期推理时优化方法和微调模型。同时,仅需4步推理,FunEditor即实现比现有流行方法快5-24倍的推理加速。代码发布于:mhmdsmdi.github.io/funeditor/。