Fine-grained skill representations, commonly referred to as knowledge components (KCs), are fundamental to many approaches in student modeling and learning analytics. However, KC-level correctness labels are rarely available in real-world datasets, especially for open-ended programming tasks where solutions typically involve multiple KCs simultaneously. Simply propagating problem-level correctness to all associated KCs obscures partial mastery and often leads to poorly fitted learning curves. To address this challenge, we propose an automated framework that leverages large language models (LLMs) to label KC-level correctness directly from student-written code. Our method assesses whether each KC is correctly applied and further introduces a temporal context-aware Code-KC mapping mechanism to better align KCs with individual student code. We evaluate the resulting KC-level correctness labels in terms of learning curve fit and predictive performance using the power law of practice and the Additive Factors Model. Experimental results show that our framework leads to learning curves that are more consistent with cognitive theory and improves predictive performance, compared to baselines. Human evaluation further demonstrates substantial agreement between LLM and expert annotations.
翻译:细粒度技能表征(通常称为知识组件,KCs)是学生建模与学习分析中许多方法的基础。然而,在实际数据集中,尤其是在通常同时涉及多个知识组件的开放式编程任务中,知识组件级别的正确性标签很少可用。简单地将问题级别的正确性传播到所有关联的知识组件会掩盖部分掌握情况,并常常导致拟合不佳的学习曲线。为应对这一挑战,我们提出了一种自动化框架,利用大型语言模型(LLMs)直接从学生编写的代码中标注知识组件级别的正确性。我们的方法评估每个知识组件是否正确应用,并进一步引入一种时序上下文感知的代码-知识组件映射机制,以更好地将知识组件与个体学生代码对齐。我们使用练习幂律和加性因素模型,从学习曲线拟合和预测性能两方面评估所得的知识组件级别正确性标签。实验结果表明,与基线方法相比,我们的框架产生的学习曲线更符合认知理论,并提升了预测性能。人工评估进一步表明,大型语言模型与专家标注之间具有高度一致性。