MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

Text-guided image editing is widely needed in daily life, ranging from personal use to professional applications such as Photoshop. However, existing methods are either zero-shot or trained on an automatically synthesized dataset, which contains a high volume of noise. Thus, they still require lots of manual tuning to produce desirable outcomes in practice. To address this issue, we introduce MagicBrush (https://osu-nlp-group.github.io/MagicBrush/), the first large-scale, manually annotated dataset for instruction-guided real image editing that covers diverse scenarios: single-turn, multi-turn, mask-provided, and mask-free editing. MagicBrush comprises over 10K manually annotated triplets (source image, instruction, target image), which supports trainining large-scale text-guided image editing models. We fine-tune InstructPix2Pix on MagicBrush and show that the new model can produce much better images according to human evaluation. We further conduct extensive experiments to evaluate current image editing baselines from multiple dimensions including quantitative, qualitative, and human evaluations. The results reveal the challenging nature of our dataset and the gap between current baselines and real-world editing needs.

翻译：文本引导的图像编辑在日常生活及专业应用（如Photoshop）中广泛存在需求。然而，现有方法要么采用零样本学习，要么基于自动合成数据集训练，这些数据集包含大量噪声。因此，实际应用中仍需大量手动调参才能获得理想效果。为解决该问题，我们提出MagicBrush（https://osu-nlp-group.github.io/MagicBrush/）——首个面向指令引导真实图像编辑的大规模人工标注数据集，覆盖单轮、多轮、掩码提供及无掩码编辑等多样场景。该数据集包含超过1万个人工标注三元组（源图像、指令、目标图像），可支持大规模文本引导图像编辑模型的训练。我们基于MagicBrush微调InstructPix2Pix模型，人工评估显示新模型能生成更优质的图像。此外，我们通过定量、定性与人工评估等多维度实验，系统评估了当前图像编辑基线方法。结果表明，本数据集具有高挑战性，且现有基线方法与真实编辑需求之间存在显著差距。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日