Significant progress has been made in the field of Instruction-based Image Editing (IIE). However, evaluating these models poses a significant challenge. A crucial requirement in this field is the establishment of a comprehensive evaluation benchmark for accurately assessing editing results and providing valuable insights for its further development. In response to this need, we propose I2EBench, a comprehensive benchmark designed to automatically evaluate the quality of edited images produced by IIE models from multiple dimensions. I2EBench consists of 2,000+ images for editing, along with 4,000+ corresponding original and diverse instructions. It offers three distinctive characteristics: 1) Comprehensive Evaluation Dimensions: I2EBench comprises 16 evaluation dimensions that cover both high-level and low-level aspects, providing a comprehensive assessment of each IIE model. 2) Human Perception Alignment: To ensure the alignment of our benchmark with human perception, we conducted an extensive user study for each evaluation dimension. 3) Valuable Research Insights: By analyzing the advantages and disadvantages of existing IIE models across the 16 dimensions, we offer valuable research insights to guide future development in the field. We will open-source I2EBench, including all instructions, input images, human annotations, edited images from all evaluated methods, and a simple script for evaluating the results from new IIE models. The code, dataset and generated images from all IIE models are provided in github: https://github.com/cocoshe/I2EBench.
翻译:基于指令的图像编辑领域已取得显著进展。然而,评估这些模型仍面临重大挑战。该领域的一个关键需求是建立全面的评估基准,以准确评估编辑结果并为其进一步发展提供有价值的见解。为满足这一需求,我们提出了I2EBench——一个旨在从多个维度自动评估IIE模型生成编辑图像质量的综合基准。I2EBench包含2000余张待编辑图像及4000余条对应的原始多样化指令,具备三个显著特征:1)全面评估维度:涵盖16个高层与低层评估维度,为每个IIE模型提供全方位评估;2)人类感知对齐:通过针对每个评估维度开展大规模用户研究,确保基准与人类感知的一致性;3)重要研究洞见:通过分析现有IIE模型在16个维度的优劣,为领域未来发展提供指导性研究见解。我们将开源I2EBench,包括所有指令、输入图像、人工标注、已评估方法的编辑图像,以及用于评估新IIE模型结果的简易脚本。所有IIE模型的代码、数据集及生成图像均发布于github:https://github.com/cocoshe/I2EBench。