EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education

While AI-generated content (AIGC) models have achieved remarkable success in generating photorealistic videos, their potential to support visual, story-driven learning in education remains largely untapped. To close this gap, we present EduAIGV-1k, the first benchmark dataset and evaluation framework dedicated to assessing the quality of AI-generated videos (AIGVs) designed to teach foundational math concepts, such as numbers and geometry, to young learners. EduAIGV-1k contains 1,130 short videos produced by ten state-of-the-art text-to-video (T2V) models using 113 pedagogy-oriented prompts. Each video is accompanied by rich, fine-grained annotations along two complementary axes: (1) Perceptual quality, disentangled into spatial and temporal fidelity, and (2) Prompt alignment, labeled at the word-level and sentence-level to quantify the degree to which each mathematical concept in the prompt is accurately grounded in the generated video. These fine-grained annotations transform each video into a multi-dimensional, interpretable supervision signal, far beyond a single quality score. Leveraging this dense feedback, we introduce EduVQA for both perceptual and alignment quality assessment of AIGVs. In particular, we propose a Structured 2D Mixture-of-Experts (S2D-MoE) module, which enhances the dependency between overall quality and each sub-dimension by shared experts and dynamic 2D gating matrix. Extensive experiments show our EduVQA consistently outperforms existing VQA baselines. Both our dataset and code will be publicly available.

翻译：尽管AI生成内容（AIGC）模型在生成逼真视频方面取得了显著成功，但其在教育领域支持视觉化、故事驱动式学习的潜力仍未得到充分挖掘。为填补这一空白，我们提出了EduAIGV-1k——首个专注于评估面向教育场景的AI生成视频（AIGV）质量的基准数据集与评估框架，该数据集旨在向低龄学习者传授数字、几何等基础数学概念。EduAIGV-1k包含1,130个短视频，由十种先进的文本到视频（T2V）模型根据113个教学导向的提示语生成。每个视频均配备沿两个互补维度细粒度标注的丰富注释：（1）感知质量，解耦为空间保真度与时间保真度；（2）提示对齐，通过词级与句级标注量化提示语中每个数学概念在生成视频中的准确呈现程度。这些细粒度标注将每个视频转化为超越单一质量评分的多维度可解释监督信号。基于这种密集反馈机制，我们提出了面向AIGV感知质量与对齐质量评估的EduVQA框架。特别地，我们设计了结构化二维专家混合（S2D-MoE）模块，通过共享专家与动态二维门控矩阵增强整体质量与各子维度间的关联性。大量实验表明，我们的EduVQA模型在各项评估中持续优于现有视频质量评估基线方法。本研究的完整数据集与代码将公开发布。