MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

Text-guided image editing is widely needed in daily life, ranging from personal use to professional applications such as Photoshop. However, existing methods are either zero-shot or trained on an automatically synthesized dataset, which contains a high volume of noise. Thus, they still require lots of manual tuning to produce desirable outcomes in practice. To address this issue, we introduce MagicBrush (https://osu-nlp-group.github.io/MagicBrush/), the first large-scale, manually annotated dataset for instruction-guided real image editing that covers diverse scenarios: single-turn, multi-turn, mask-provided, and mask-free editing. MagicBrush comprises over 10K manually annotated triplets (source image, instruction, target image), which supports trainining large-scale text-guided image editing models. We fine-tune InstructPix2Pix on MagicBrush and show that the new model can produce much better images according to human evaluation. We further conduct extensive experiments to evaluate current image editing baselines from multiple dimensions including quantitative, qualitative, and human evaluations. The results reveal the challenging nature of our dataset and the gap between current baselines and real-world editing needs.

翻译：文本引导的图像编辑在日常生活中的个人使用到专业应用（如Photoshop）中具有广泛需求。然而，现有方法要么是零样本的，要么是在自动合成的数据集上训练的，这导致数据集中包含大量噪声。因此，这些方法在实践中仍需要大量手动调整才能产生理想效果。为解决这一问题，我们提出了MagicBrush（https://osu-nlp-group.github.io/MagicBrush/），这是首个大规模人工标注的指令引导真实图像编辑数据集，涵盖多种场景：单轮编辑、多轮编辑、提供遮罩编辑和无遮罩编辑。MagicBrush包含超过1万个人工标注的三元组（源图像、指令、目标图像），可支持大规模文本引导图像编辑模型的训练。我们在MagicBrush上微调了InstructPix2Pix，并证明根据人工评估，新模型能够生成质量显著提升的图像。此外，我们通过定量评估、定性评估和人工评估等多个维度对当前图像编辑基线进行了广泛实验。结果表明，我们的数据集具有挑战性，且现有基线方法与现实编辑需求之间仍存在差距。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日