Knowledge tracing (KT) is a popular approach for modeling students' learning progress over time, which can enable more personalized and adaptive learning. However, existing KT approaches face two major limitations: (1) they rely heavily on expert-defined knowledge concepts (KCs) in questions, which is time-consuming and prone to errors; and (2) KT methods tend to overlook the semantics of both questions and the given KCs. In this work, we address these challenges and present KCQRL, a framework for automated knowledge concept annotation and question representation learning that can improve the effectiveness of any existing KT model. First, we propose an automated KC annotation process using large language models (LLMs), which generates question solutions and then annotates KCs in each solution step of the questions. Second, we introduce a contrastive learning approach to generate semantically rich embeddings for questions and solution steps, aligning them with their associated KCs via a tailored false negative elimination approach. These embeddings can be readily integrated into existing KT models, replacing their randomly initialized embeddings. We demonstrate the effectiveness of KCQRL across 15 KT algorithms on two large real-world Math learning datasets, where we achieve consistent performance improvements.
翻译:知识追踪(KT)是一种用于建模学生随时间学习进展的流行方法,能够实现更加个性化和自适应的学习。然而,现有KT方法面临两大局限:(1)它们严重依赖专家定义问题中的知识概念(KCs),这一过程耗时且容易出错;(2)KT方法往往忽视问题本身及给定KCs的语义信息。在本工作中,我们针对这些挑战提出了KCQRL框架,该框架通过自动化知识概念标注与问题表征学习,能够提升任何现有KT模型的有效性。首先,我们提出一种基于大语言模型(LLMs)的自动化KC标注流程,该流程生成问题解答步骤,并随后在问题的每个解答步骤中标注KCs。其次,我们引入一种对比学习方法,为问题及其解答步骤生成语义丰富的嵌入表示,并通过定制的假阴性消除方法使其与相关KCs对齐。这些嵌入表示可直接集成到现有KT模型中,替代其随机初始化的嵌入。我们在两个大规模真实数学学习数据集上,对15种KT算法验证了KCQRL的有效性,实验结果表明该方法能持续提升模型性能。