We present FireRed-Image-Edit, a diffusion transformer for instruction-based image editing that achieves state-of-the-art performance through systematic optimization of data curation, training methodology, and evaluation design. We construct a 1.6B-sample training corpus, comprising 900M text-to-image and 700M image editing pairs from diverse sources. After rigorous cleaning, stratification, auto-labeling, and two-stage filtering, we retain over 100M high-quality samples balanced between generation and editing, ensuring strong semantic coverage and instruction alignment. Our multi-stage training pipeline progressively builds editing capability via pre-training, supervised fine-tuning, and reinforcement learning. To improve data efficiency, we introduce a Multi-Condition Aware Bucket Sampler for variable-resolution batching and Stochastic Instruction Alignment with dynamic prompt re-indexing. To stabilize optimization and enhance controllability, we propose Asymmetric Gradient Optimization for DPO, DiffusionNFT with layout-aware OCR rewards for text editing, and a differentiable Consistency Loss for identity preservation. We further establish REDEdit-Bench, a comprehensive benchmark spanning 15 editing categories, including newly introduced beautification and low-level enhancement tasks. Extensive experiments on REDEdit-Bench and public benchmarks (ImgEdit and GEdit) demonstrate competitive or superior performance against both open-source and proprietary systems. To support future research, our code, models, and benchmark suite are publicly available at https://github.com/FireRedTeam/FireRed-Image-Edit/ .
翻译:我们提出了FireRed-Image-Edit——一种基于指令的图像编辑扩散Transformer模型,通过系统优化数据策展、训练方法和评估设计,实现了最先进的性能。我们构建了包含16亿样本的训练语料库,涵盖来自不同来源的9亿文生图样本和7亿图像编辑对样本。经过严格清洗、分层、自动标注和两阶段过滤后,我们保留了超过1亿高质量样本,在生成与编辑任务间保持平衡,确保了强大的语义覆盖和指令对齐。我们的多阶段训练流程通过预训练、监督微调和强化学习逐步构建编辑能力。为提升数据效率,我们引入了多条件感知桶采样器用于可变分辨率批处理,以及基于动态提示重索引的随机指令对齐。为稳定优化并增强可控性,我们提出了面向DPO的非对称梯度优化、用于文本编辑的含布局感知OCR奖励的DiffusionNFT、以及用于身份保持的可微一致性损失。我们进一步建立了REDEdit-Bench——一个涵盖15个编辑类别的综合基准,包括新引入的美化与低级增强任务。在REDEdit-Bench及公开基准(ImgEdit和GEdit)上的大量实验表明,我们的系统在开源和专有系统中均具有竞争力或更优性能。为支持未来研究,我们的代码、模型和基准套件已在https://github.com/FireRedTeam/FireRed-Image-Edit/ 公开提供。