KTbench: A Novel Data Leakage-Free Framework for Knowledge Tracing

Knowledge Tracing (KT) is concerned with predicting students' future performance on learning items in intelligent tutoring systems. Learning items are tagged with skill labels called knowledge concepts (KCs). Many KT models expand the sequence of item-student interactions into KC-student interactions by replacing learning items with their constituting KCs. This often results in a longer sequence length. This approach addresses the issue of sparse item-student interactions and minimises model parameters. However, two problems have been identified with such models. The first problem is the model's ability to learn correlations between KCs belonging to the same item, which can result in the leakage of ground truth labels and hinder performance. This problem can lead to a significant decrease in performance on datasets with a higher number of KCs per item. The second problem is that the available benchmark implementations ignore accounting for changes in sequence length when expanding KCs, leading to different models being tested with varying sequence lengths but still compared against the same benchmark. To address these problems, we introduce a general masking framework that mitigates the first problem and enhances the performance of such KT models while preserving the original model architecture without significant alterations. Additionally, we introduce KTbench, an open-source benchmark library designed to ensure the reproducibility of this work while mitigating the second problem.

翻译：知识追踪（KT）旨在预测学生在智能辅导系统中对学习项目的未来表现。学习项目通常带有称为知识概念（KC）的技能标签。许多KT模型通过将学习项目替换为其构成的知识概念，将项目-学生交互序列扩展为KC-学生交互序列，这往往会导致序列长度增加。该方法可缓解项目-学生交互稀疏性问题并减少模型参数。然而，此类模型存在两个问题：其一，模型会学习同一项目内知识概念间的相关性，这可能导致真实标签泄露并影响性能——在单个项目包含较多知识概念的数据集上，该问题会导致性能显著下降；其二，现有基准测试实现未考虑扩展KC时序列长度的变化，导致不同模型虽使用不同序列长度测试，却仍与同一基准进行对比。针对上述问题，我们提出了一种通用掩码框架，该框架能在保持原始模型架构无需重大改动的前提下缓解第一个问题，并提升此类KT模型的性能。同时，我们开源了基准库KTbench，该库在解决第二个问题的同时确保了本研究的可复现性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日