Knowledge Tracing (KT) aims to predict the future performance of students by tracking the development of their knowledge states. Despite all the recent progress made in this field, the application of KT models in education systems is still restricted from the data perspectives: 1) limited access to real life data due to data protection concerns, 2) lack of diversity in public datasets, 3) noises in benchmark datasets such as duplicate records. To resolve these problems, we simulated student data with three statistical strategies based on public datasets and tested their performance on two KT baselines. While we observe only minor performance improvement with additional synthetic data, our work shows that using only synthetic data for training can lead to similar performance as real data.
翻译:知识追踪(KT)旨在通过跟踪学生知识状态的发展来预测其未来表现。尽管该领域近期取得了诸多进展,但KT模型在教育系统中的应用仍面临数据层面的制约:1)因数据保护问题导致对真实数据的访问受限;2)公开数据集缺乏多样性;3)基准数据集中存在重复记录等噪声。为解决这些问题,我们基于公开数据集采用三种统计策略模拟学生数据,并在两个KT基线上测试其性能。虽然我们观察到添加合成数据仅带来微小的性能提升,但本研究表明,仅使用合成数据进行训练即可获得与真实数据相近的性能。