AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI

The burgeoning field of Artificial Intelligence Generated Content (AIGC) is witnessing rapid advancements, particularly in video generation. This paper introduces AIGCBench, a pioneering comprehensive and scalable benchmark designed to evaluate a variety of video generation tasks, with a primary focus on Image-to-Video (I2V) generation. AIGCBench tackles the limitations of existing benchmarks, which suffer from a lack of diverse datasets, by including a varied and open-domain image-text dataset that evaluates different state-of-the-art algorithms under equivalent conditions. We employ a novel text combiner and GPT-4 to create rich text prompts, which are then used to generate images via advanced Text-to-Image models. To establish a unified evaluation framework for video generation tasks, our benchmark includes 11 metrics spanning four dimensions to assess algorithm performance. These dimensions are control-video alignment, motion effects, temporal consistency, and video quality. These metrics are both reference video-dependent and video-free, ensuring a comprehensive evaluation strategy. The evaluation standard proposed correlates well with human judgment, providing insights into the strengths and weaknesses of current I2V algorithms. The findings from our extensive experiments aim to stimulate further research and development in the I2V field. AIGCBench represents a significant step toward creating standardized benchmarks for the broader AIGC landscape, proposing an adaptable and equitable framework for future assessments of video generation tasks. We have open-sourced the dataset and evaluation code on the project website: https://www.benchcouncil.org/AIGCBench.

翻译：人工智能生成内容（AIGC）领域正经历快速发展，尤其在视频生成方面。本文提出AIGCBench，这是一个开创性的综合性、可扩展基准测试，旨在评估多种视频生成任务，重点关注图像到视频（I2V）生成。AIGCBench解决了现有基准测试因缺乏多样化数据集而存在的局限性，通过引入一个多样化的开放域图文数据集，在同等条件下评估不同先进算法。我们采用新颖的文本组合器和GPT-4生成丰富的文本提示，随后利用先进的文生图模型生成图像。为建立统一的视频生成任务评估框架，本基准测试包含涵盖四个维度的11项指标，以评估算法性能。这些维度包括：控制-视频对齐、运动效果、时间一致性和视频质量。这些指标既包括依赖于参考视频的指标，也包括无参考视频的指标，从而确保评估策略的全面性。所提出的评估标准与人类判断高度相关，能够揭示当前I2V算法的优缺点。基于大规模实验的发现旨在推动I2V领域的进一步研究与发展。AIGCBench为构建更广泛的AIGC场景下的标准化基准迈出了重要一步，并为未来视频生成任务的评估提供了适应性强的公平框架。我们已在项目网站上开源了数据集和评估代码：https://www.benchcouncil.org/AIGCBench。