Pico-Banana-400K：一个用于文本引导图像编辑的大规模数据集 (Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing)

Recent advances in multimodal models have demonstrated remarkable text-guided image editing capabilities, with systems like GPT-4o and Nano-Banana setting new benchmarks. However, the research community's progress remains constrained by the absence of large-scale, high-quality, and openly accessible datasets built from real images. We introduce Pico-Banana-400K, a comprehensive 400K-image dataset for instruction-based image editing. Our dataset is constructed by leveraging Nano-Banana to generate diverse edit pairs from real photographs in the OpenImages collection. What distinguishes Pico-Banana-400K from previous synthetic datasets is our systematic approach to quality and diversity. We employ a fine-grained image editing taxonomy to ensure comprehensive coverage of edit types while maintaining precise content preservation and instruction faithfulness through MLLM-based quality scoring and careful curation. Beyond single turn editing, Pico-Banana-400K enables research into complex editing scenarios. The dataset includes three specialized subsets: (1) a 72K-example multi-turn collection for studying sequential editing, reasoning, and planning across consecutive modifications; (2) a 56K-example preference subset for alignment research and reward model training; and (3) paired long-short editing instructions for developing instruction rewriting and summarization capabilities. By providing this large-scale, high-quality, and task-rich resource, Pico-Banana-400K establishes a robust foundation for training and benchmarking the next generation of text-guided image editing models.

翻译：多模态模型的最新进展展示了卓越的文本引导图像编辑能力，诸如GPT-4o和Nano-Banana等系统设定了新的基准。然而，由于缺乏基于真实图像构建的大规模、高质量且可公开访问的数据集，研究社区的进展仍然受到限制。我们推出了Pico-Banana-400K，这是一个用于基于指令的图像编辑的综合性40万图像数据集。我们的数据集通过利用Nano-Banana从OpenImages集合中的真实照片生成多样化的编辑对来构建。Pico-Banana-400K区别于先前合成数据集之处在于我们对质量和多样性的系统化方法。我们采用细粒度的图像编辑分类法，以确保全面覆盖编辑类型，同时通过基于MLLM的质量评分和精心策划，保持精确的内容保留和指令忠实度。除了单轮编辑，Pico-Banana-400K还支持对复杂编辑场景的研究。该数据集包含三个专门的子集：（1）一个包含7.2万个示例的多轮编辑集合，用于研究连续修改中的顺序编辑、推理和规划；（2）一个包含5.6万个示例的偏好子集，用于对齐研究和奖励模型训练；以及（3）配对的“长-短”编辑指令，用于开发指令重写和摘要生成能力。通过提供这一大规模、高质量且任务丰富的资源，Pico-Banana-400K为训练和基准测试下一代文本引导图像编辑模型奠定了坚实的基础。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日