FunEditor: Achieving Complex Image Edits via Function Aggregation with Diffusion Models

Diffusion models have demonstrated outstanding performance in generative tasks, making them ideal candidates for image editing. Recent studies highlight their ability to apply desired edits effectively by following textual instructions, yet with two key challenges remaining. First, these models struggle to apply multiple edits simultaneously, resulting in computational inefficiencies due to their reliance on sequential processing. Second, relying on textual prompts to determine the editing region can lead to unintended alterations to the image. We introduce FunEditor, an efficient diffusion model designed to learn atomic editing functions and perform complex edits by aggregating simpler functions. This approach enables complex editing tasks, such as object movement, by aggregating multiple functions and applying them simultaneously to specific areas. Our experiments demonstrate that FunEditor significantly outperforms recent inference-time optimization methods and fine-tuned models, either quantitatively across various metrics or through visual comparisons or both, on complex tasks like object movement and object pasting. In the meantime, with only 4 steps of inference, FunEditor achieves 5-24x inference speedups over existing popular methods. The code is available at: mhmdsmdi.github.io/funeditor/.

翻译：扩散模型在生成任务中展现出卓越性能，使其成为图像编辑的理想选择。近期研究强调了其通过遵循文本指令有效应用期望编辑的能力，但仍存在两个关键挑战。首先，这些模型难以同时应用多重编辑，因其依赖顺序处理而导致计算效率低下。其次，依赖文本提示确定编辑区域可能导致对图像的意外修改。我们提出FunEditor，这是一种高效的扩散模型，旨在学习原子编辑函数并通过聚合简单函数来执行复杂编辑。该方法通过聚合多个函数并将其同时应用于特定区域，实现了如物体移动等复杂编辑任务。我们的实验表明，在物体移动和物体粘贴等复杂任务上，FunEditor在多项指标定量评估或视觉比较或两者兼具的评估中，均显著优于近期推理时优化方法和微调模型。同时，仅需4步推理，FunEditor即实现比现有流行方法快5-24倍的推理加速。代码发布于：mhmdsmdi.github.io/funeditor/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日