AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI

The burgeoning field of Artificial Intelligence Generated Content (AIGC) is witnessing rapid advancements, particularly in video generation. This paper introduces AIGCBench, a pioneering comprehensive and scalable benchmark designed to evaluate a variety of video generation tasks, with a primary focus on Image-to-Video (I2V) generation. AIGCBench tackles the limitations of existing benchmarks, which suffer from a lack of diverse datasets, by including a varied and open-domain image-text dataset that evaluates different state-of-the-art algorithms under equivalent conditions. We employ a novel text combiner and GPT-4 to create rich text prompts, which are then used to generate images via advanced Text-to-Image models. To establish a unified evaluation framework for video generation tasks, our benchmark includes 11 metrics spanning four dimensions to assess algorithm performance. These dimensions are control-video alignment, motion effects, temporal consistency, and video quality. These metrics are both reference video-dependent and video-free, ensuring a comprehensive evaluation strategy. The evaluation standard proposed correlates well with human judgment, providing insights into the strengths and weaknesses of current I2V algorithms. The findings from our extensive experiments aim to stimulate further research and development in the I2V field. AIGCBench represents a significant step toward creating standardized benchmarks for the broader AIGC landscape, proposing an adaptable and equitable framework for future assessments of video generation tasks. We have open-sourced the dataset and evaluation code on the project website: https://www.benchcouncil.org/AIGCBench.

翻译：随着人工智能生成内容（AIGC）领域的蓬勃发展，尤其在视频生成方面取得了快速进展。本文提出AIGCBench，这是一个开创性的、全面且可扩展的基准测试集，旨在评估各类视频生成任务，重点关注图像到视频（I2V）生成。针对现有基准测试因缺乏多样化数据集而存在的局限性，AIGCBench通过纳入一个多样化的开放域图像-文本数据集，在同等条件下评估不同先进算法。我们采用新颖的文本组合器与GPT-4生成丰富的文本提示，进而利用先进的文本到图像模型生成图像。为构建统一的视频生成任务评估框架，本基准测试集包含涵盖四个维度的11项指标以评估算法性能，即控制-视频对齐、运动效果、时间一致性和视频质量。这些指标既包含参考视频依赖型指标，也包含无参考视频指标，确保评估策略的全面性。提出的评估标准与人类判断具有良好相关性，可揭示当前I2V算法的优势与不足。广泛实验所得结果旨在推动I2V领域的进一步研究与发展。AIGCBench标志着向构建更广泛AIGC领域标准化基准迈出的重要一步，为未来视频生成任务的评估提供了可扩展且公平的框架。我们已在项目网站开源数据集与评估代码：https://www.benchcouncil.org/AIGCBench。