Student modeling, the task of inferring a student's learning characteristics through their interactions with coursework, is a fundamental issue in intelligent education. Although the recent attempts from knowledge tracing and cognitive diagnosis propose several promising directions for improving the usability and effectiveness of current models, the existing public datasets are still insufficient to meet the need for these potential solutions due to their ignorance of complete exercising contexts, fine-grained concepts, and cognitive labels. In this paper, we present MoocRadar, a fine-grained, multi-aspect knowledge repository consisting of 2,513 exercise questions, 5,600 knowledge concepts, and over 12 million behavioral records. Specifically, we propose a framework to guarantee a high-quality and comprehensive annotation of fine-grained concepts and cognitive labels. The statistical and experimental results indicate that our dataset provides the basis for the future improvements of existing methods. Moreover, to support the convenient usage for researchers, we release a set of tools for data querying, model adaption, and even the extension of our repository, which are now available at https://github.com/THU-KEG/MOOC-Radar.
翻译:学生建模——通过学生与课程作业的交互推断其学习特征的任务——是智能教育中的基本问题。尽管近期知识追踪和认知诊断的研究尝试为提升现有模型的可用性与有效性提出了若干有前景的方向,但现有公开数据集因缺乏完整的练习上下文、细粒度概念及认知标签,仍难以满足这些潜在解决方案的需求。本文提出MoocRadar,一个包含2,513道练习题、5,600个知识概念及超过1,200万条行为记录的细粒度多维度知识库。具体而言,我们设计了一套框架以保障细粒度概念和认知标签的高质量全面标注。统计与实验结果表明,本数据集为现有方法的未来改进提供了基础。此外,为方便研究者使用,我们发布了一套工具集,支持数据查询、模型适配乃至知识库的扩展,现已开源于https://github.com/THU-KEG/MOOC-Radar。