iEdit: Localised Text-guided Image Editing with Weak Supervision

Diffusion models (DMs) can generate realistic images with text guidance using large-scale datasets. However, they demonstrate limited controllability in the output space of the generated images. We propose a novel learning method for text-guided image editing, namely \texttt{iEdit}, that generates images conditioned on a source image and a textual edit prompt. As a fully-annotated dataset with target images does not exist, previous approaches perform subject-specific fine-tuning at test time or adopt contrastive learning without a target image, leading to issues on preserving the fidelity of the source image. We propose to automatically construct a dataset derived from LAION-5B, containing pseudo-target images with their descriptive edit prompts given input image-caption pairs. This dataset gives us the flexibility of introducing a weakly-supervised loss function to generate the pseudo-target image from the latent noise of the source image conditioned on the edit prompt. To encourage localised editing and preserve or modify spatial structures in the image, we propose a loss function that uses segmentation masks to guide the editing during training and optionally at inference. Our model is trained on the constructed dataset with 200K samples and constrained GPU resources. It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.

翻译：扩散模型（DMs）可利用大规模数据集通过文本引导生成逼真的图像。然而，它们在生成图像的输出空间中表现出有限的可控性。我们提出了一种新颖的文本引导图像编辑学习方法，即\texttt{iEdit}，该方法根据源图像和文本编辑提示生成图像。由于不存在带有目标图像的完全标注数据集，先前的方法在测试时进行特定对象的微调，或采用无目标图像的对比学习，导致源图像保真度难以维持。我们建议自动构建一个源自LAION-5B的数据集，其中包含伪目标图像及其给定输入图像-描述对的描述性编辑提示。该数据集使我们能够灵活地引入弱监督损失函数，从源图像的潜在噪声中生成以编辑提示为条件的伪目标图像。为鼓励局部编辑并保留或修改图像中的空间结构，我们提出一种利用分割掩码在训练期间（以及可选地在推理时）指导编辑的损失函数。我们的模型在包含20万样本的构建数据集上训练，且GPU资源受限。在图像保真度、CLIP对齐分数方面，以及针对生成图像和真实图像的编辑定性评估中，其均展现出优于同类方法的性能。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

434+阅读 · 2021年1月11日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日