Instruction-based image editing focuses on equipping a generative model with the capacity to adhere to human-written instructions for editing images. Current approaches typically comprehend explicit and specific instructions. However, they often exhibit a deficiency in executing active reasoning capacities required to comprehend instructions that are implicit or insufficiently defined. To enhance active reasoning capabilities and impart intelligence to the editing model, we introduce ReasonPix2Pix, a comprehensive reasoning-attentive instruction editing dataset. The dataset is characterized by 1) reasoning instruction, 2) more realistic images from fine-grained categories, and 3) increased variances between input and edited images. When fine-tuned with our dataset under supervised conditions, the model demonstrates superior performance in instructional editing tasks, independent of whether the tasks require reasoning or not. The code will be available at https://github.com/Jin-Ying/ReasonPix2Pix.
翻译:基于指令的图像编辑旨在使生成模型能够遵循人类编写的指令来编辑图像。当前方法通常能够理解显式且具体的指令。然而,它们在执行理解隐式或定义不充分指令所需的主动推理能力方面往往存在不足。为了增强主动推理能力并赋予编辑模型智能,我们引入了ReasonPix2Pix,一个全面的、注重推理的指令编辑数据集。该数据集具有以下特点:1) 推理指令,2) 来自细粒度类别的更真实图像,以及3) 输入图像与编辑后图像之间更大的差异。当在我们的数据集上进行有监督微调后,该模型在指令编辑任务中展现出卓越的性能,无论任务是否需要推理。代码将在 https://github.com/Jin-Ying/ReasonPix2Pix 上提供。