Story visualization aims to generate coherent image sequences that faithfully represent a narrative and match given character references. Despite progress in generative models, existing benchmarks remain narrow in scope, often limited to short prompts, lacking character references, or single-image cases, failing to reflect real-world narrative complexity and obscuring true model performance.We introduce ViStoryBench, a comprehensive benchmark designed to evaluate story visualization models across varied narrative structures, visual styles, and character settings. It features richly annotated multi-shot scripts derived from curated stories spanning literature, film, and folklore. Large language models assist in story summarization and script generation, with all outputs verified by humans for coherence and fidelity. Character references are carefully curated to maintain consistency across different artistic styles. ViStoryBench proposes a suite of multi-dimensional automated metrics to evaluate character consistency, style similarity, prompt alignment, aesthetic quality, and artifacts like copy-paste behavior. These metrics are validated through human studies and used to assess a broad range of open-source and commercial models, enabling systematic analysis and encouraging advances in visual storytelling.
翻译:故事可视化的目标是生成连贯的图像序列,使其能够忠实呈现叙事内容并与给定的角色参考保持一致。尽管生成模型取得了进展,但现有基准测试的范围依然狭窄,通常局限于短提示、缺乏角色参考或仅涉及单张图像案例,未能反映现实叙事复杂性,并掩盖了模型的真实性能。我们引入ViStoryBench,这是一个综合基准测试,旨在评估故事可视化模型在不同叙事结构、视觉风格和角色设定下的表现。该基准测试包含从文学、电影和民间传说中精选出的、带有丰富注释的多镜头脚本。大语言模型辅助故事摘要与脚本生成,所有输出经人工验证以确保连贯性与保真度。角色参考经过精心策划,以在不同艺术风格中保持一致性。ViStoryBench提出了一套多维自动化指标,用于评估角色一致性、风格相似性、提示对齐度、美学质量以及伪影(如复制粘贴行为)。这些指标通过人工研究验证,并用于评估广泛的开源与商业模型,从而支持系统性分析并推动视觉故事讲述领域的发展。