ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing

Instruction-based image editing focuses on equipping a generative model with the capacity to adhere to human-written instructions for editing images. Current approaches typically comprehend explicit and specific instructions. However, they often exhibit a deficiency in executing active reasoning capacities required to comprehend instructions that are implicit or insufficiently defined. To enhance active reasoning capabilities and impart intelligence to the editing model, we introduce ReasonPix2Pix, a comprehensive reasoning-attentive instruction editing dataset. The dataset is characterized by 1) reasoning instruction, 2) more realistic images from fine-grained categories, and 3) increased variances between input and edited images. When fine-tuned with our dataset under supervised conditions, the model demonstrates superior performance in instructional editing tasks, independent of whether the tasks require reasoning or not. The code will be available at https://github.com/Jin-Ying/ReasonPix2Pix.

翻译：基于指令的图像编辑旨在使生成模型能够遵循人类编写的指令来编辑图像。当前方法通常能够理解显式且具体的指令。然而，它们在执行理解隐式或定义不充分指令所需的主动推理能力方面往往存在不足。为了增强主动推理能力并赋予编辑模型智能，我们引入了ReasonPix2Pix，一个全面的、注重推理的指令编辑数据集。该数据集具有以下特点：1) 推理指令，2) 来自细粒度类别的更真实图像，以及3) 输入图像与编辑后图像之间更大的差异。当在我们的数据集上进行有监督微调后，该模型在指令编辑任务中展现出卓越的性能，无论任务是否需要推理。代码将在 https://github.com/Jin-Ying/ReasonPix2Pix 上提供。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日