Text-based image editing has advanced significantly in recent years. With the rise of diffusion models, image editing via textual instructions has become ubiquitous. Unfortunately, current models lack the ability to customize the quantity of the change per pixel or per image fragment, resorting to changing the entire image in an equal amount, or editing a specific region using a binary mask. In this paper, we suggest a new framework which enables the user to customize the quantity of change for each image fragment, thereby enhancing the flexibility and verbosity of modern diffusion models. Our framework does not require model training or fine-tuning, but instead performs everything at inference time, making it easily applicable to an existing model. We show both qualitatively and quantitatively that our method allows better controllability and can produce results which are unattainable by existing models. Our code is available at: https://github.com/exx8/differential-diffusion
翻译:近年来,基于文本的图像编辑取得了显著进展。随着扩散模型的兴起,通过文本指令进行图像编辑变得无处不在。遗憾的是,现有模型缺乏为每个像素或图像片段自定义变化量的能力,只能平等地改变整个图像,或使用二值掩膜编辑特定区域。在本文中,我们提出了一种新框架,使用户能够为每个图像片段自定义变化量,从而增强现代扩散模型的灵活性和表现力。我们的框架无需模型训练或微调,一切在推理时完成,因此易于应用于现有模型。我们定性和定量地证明,我们的方法可实现更好的可控性,并能生成现有模型无法达到的结果。我们的代码可在以下网址获取:https://github.com/exx8/differential-diffusion