Diffusion models have demonstrated strong performance in generative tasks, making them ideal candidates for image editing. Recent studies highlight their ability to apply desired edits effectively by following textual instructions, yet two key challenges persist. First, these models struggle to apply multiple edits simultaneously, resulting in computational inefficiencies due to their reliance on sequential processing. Second, relying on textual prompts to determine the editing region can lead to unintended alterations in other parts of the image. In this work, we introduce FunEditor, an efficient diffusion model designed to learn atomic editing functions and perform complex edits by aggregating simpler functions. This approach enables complex editing tasks, such as object movement, by aggregating multiple functions and applying them simultaneously to specific areas. FunEditor is 5 to 24 times faster inference than existing methods on complex tasks like object movement. Our experiments demonstrate that FunEditor significantly outperforms recent baselines, including both inference-time optimization methods and fine-tuned models, across various metrics, such as image quality assessment (IQA) and object-background consistency.
翻译:扩散模型在生成任务中展现出卓越性能,使其成为图像编辑的理想选择。近期研究凸显了其通过遵循文本指令有效应用预期编辑的能力,但仍存在两个关键挑战。首先,这些模型难以同时应用多项编辑,因其依赖顺序处理而导致计算效率低下。其次,依赖文本提示确定编辑区域可能导致图像其他部分发生意外改变。本研究提出FunEditor,这是一种高效的扩散模型,旨在学习原子编辑函数并通过聚合简单函数执行复杂编辑。该方法通过聚合多个函数并将其同时应用于特定区域,实现了如物体移动等复杂编辑任务。在物体移动等复杂任务中,FunEditor的推理速度比现有方法快5至24倍。实验表明,在图像质量评估(IQA)和物体-背景一致性等多种指标上,FunEditor显著优于包括推理时优化方法和微调模型在内的近期基线方法。