EduEVAL-DB：一个用于教育解释中教学风险评估的基于角色的数据集 (EduEVAL-DB: A Role-Based Dataset for Pedagogical Risk Evaluation in Educational Explanations)

This work introduces EduEVAL-DB, a dataset based on teacher roles designed to support the evaluation and training of automatic pedagogical evaluators and AI tutors for instructional explanations. The dataset comprises 854 explanations corresponding to 139 questions from a curated subset of the ScienceQA benchmark, spanning science, language, and social science across K-12 grade levels. For each question, one human-teacher explanation is provided and six are generated by LLM-simulated teacher roles. These roles are inspired by instructional styles and shortcomings observed in real educational practice and are instantiated via prompt engineering. We further propose a pedagogical risk rubric aligned with established educational standards, operationalizing five complementary risk dimensions: factual correctness, explanatory depth and completeness, focus and relevance, student-level appropriateness, and ideological bias. All explanations are annotated with binary risk labels through a semi-automatic process with expert teacher review. Finally, we present preliminary validation experiments to assess the suitability of EduEVAL-DB for evaluation. We benchmark a state-of-the-art education-oriented model (Gemini 2.5 Pro) against a lightweight local Llama 3.1 8B model and examine whether supervised fine-tuning on EduEVAL-DB supports pedagogical risk detection using models deployable on consumer hardware.

翻译：本研究介绍了EduEVAL-DB，这是一个基于教师角色的数据集，旨在支持对教学解释的自动教学评估器和AI导师进行评估与训练。该数据集包含854条解释，对应于来自ScienceQA基准精选子集的139个问题，涵盖K-12年级的科学、语言和社会科学领域。针对每个问题，我们提供了一条由人类教师撰写的解释，以及六条由LLM模拟的教师角色生成的解释。这些角色的设计灵感来源于真实教学实践中观察到的教学风格与不足，并通过提示工程进行实例化。我们进一步提出了一个与既有教育标准相一致的教学风险评估框架，该框架将五个互补的风险维度操作化：事实正确性、解释深度与完整性、焦点与相关性、学生水平适宜性以及意识形态偏见。所有解释均通过一个包含专家教师评审的半自动流程，标注了二元风险标签。最后，我们进行了初步验证实验，以评估EduEVAL-DB用于评估的适用性。我们将一个最先进的教育导向模型（Gemini 2.5 Pro）与一个轻量级的本地Llama 3.1 8B模型进行了基准测试，并探究了在EduEVAL-DB上进行监督微调是否能够支持使用可在消费级硬件上部署的模型进行教学风险检测。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

表格数据表示学习综述

专知会员服务

18+阅读 · 2025年4月27日

《全球中小学人工智能教育支撑环境白皮书（2022年）》北京师范大学智慧学习研究院

专知会员服务

38+阅读 · 2022年7月20日

如何处理数据缺失值？INRIA研究员Gael 《机器学习缺失值处理》54页ppt教程，为你讲解

专知会员服务

26+阅读 · 2022年4月21日

【开放书】数据可视化基础，《Fundamentals of Data Visualization》

专知会员服务

65+阅读 · 2021年6月13日