Video generation has advanced rapidly, with recent methods producing increasingly convincing animated results. However, existing benchmarks-largely designed for realistic videos-struggle to evaluate animation-style generation with its stylized appearance, exaggerated motion, and character-centric consistency. Moreover, they also rely on fixed prompt sets and rigid pipelines, offering limited flexibility for open-domain content and custom evaluation needs. To address this gap, we introduce AnimationBench, the first systematic benchmark for evaluating animation image-to-video generation. AnimationBench operationalizes the Twelve Basic Principles of Animation and IP Preservation into measurable evaluation dimensions, together with Broader Quality Dimensions including semantic consistency, motion rationality, and camera motion consistency. The benchmark supports both a standardized close-set evaluation for reproducible comparison and a flexible open-set evaluation for diagnostic analysis, and leverages visual-language models for scalable assessment. Extensive experiments show that AnimationBench aligns well with human judgment and exposes animation-specific quality differences overlooked by realism-oriented benchmarks, leading to more informative and discriminative evaluation of state-of-the-art I2V models.
翻译:视频生成技术发展迅猛,近期方法产出的动画效果愈发逼真。然而,现有基准测试主要针对真实感视频设计,难以评估动画风格生成中风格化外观、夸张运动及角色一致性等特性。此外,这些基准测试依赖固定提示集与刚性流程,对开放域内容和自定义评估需求的灵活性有限。为解决这一缺口,我们提出AnimationBench——首个系统性评估动画图像到视频生成的基准测试。该基准将动画十二基本原理与角色一致性转化为可量化评估维度,并结合语义一致性、运动合理性及相机运动一致性等广义质量维度。支持标准封闭集评估(可复现对比)与灵活开放集评估(诊断分析),并利用视觉语言模型实现可扩展评测。大量实验表明,AnimationBench与人类判断高度一致,能揭示真实感导向基准测试所忽视的动画特有质量差异,从而对当前最优图像到视频模型进行更具信息量及辨别力的评估。