Knowledge Tracing (KT) is concerned with predicting students' future performance on learning items in intelligent tutoring systems. Learning items are tagged with skill labels called knowledge concepts (KCs). Many KT models expand the sequence of item-student interactions into KC-student interactions by replacing learning items with their constituting KCs. This often results in a longer sequence length. This approach addresses the issue of sparse item-student interactions and minimises model parameters. However, two problems have been identified with such models. The first problem is the model's ability to learn correlations between KCs belonging to the same item, which can result in the leakage of ground truth labels and hinder performance. This problem can lead to a significant decrease in performance on datasets with a higher number of KCs per item. The second problem is that the available benchmark implementations ignore accounting for changes in sequence length when expanding KCs, leading to different models being tested with varying sequence lengths but still compared against the same benchmark. To address these problems, we introduce a general masking framework that mitigates the first problem and enhances the performance of such KT models while preserving the original model architecture without significant alterations. Additionally, we introduce KTbench, an open-source benchmark library designed to ensure the reproducibility of this work while mitigating the second problem.
翻译:知识追踪(KT)旨在预测学生在智能辅导系统中对学习项目的未来表现。学习项目通常带有称为知识概念(KC)的技能标签。许多KT模型通过将学习项目替换为其构成的知识概念,将项目-学生交互序列扩展为KC-学生交互序列,这往往会导致序列长度增加。该方法可缓解项目-学生交互稀疏性问题并减少模型参数。然而,此类模型存在两个问题:其一,模型会学习同一项目内知识概念间的相关性,这可能导致真实标签泄露并影响性能——在单个项目包含较多知识概念的数据集上,该问题会导致性能显著下降;其二,现有基准测试实现未考虑扩展KC时序列长度的变化,导致不同模型虽使用不同序列长度测试,却仍与同一基准进行对比。针对上述问题,我们提出了一种通用掩码框架,该框架能在保持原始模型架构无需重大改动的前提下缓解第一个问题,并提升此类KT模型的性能。同时,我们开源了基准库KTbench,该库在解决第二个问题的同时确保了本研究的可复现性。