Time series is the most prevalent form of input data for educational prediction tasks. The vast majority of research using time series data focuses on hand-crafted features, designed by experts for predictive performance and interpretability. However, extracting these features is labor-intensive for humans and computers. In this paper, we propose an approach that utilizes irregular multivariate time series modeling with graph neural networks to achieve comparable or better accuracy with raw time series clickstreams in comparison to hand-crafted features. Furthermore, we extend concept activation vectors for interpretability in raw time series models. We analyze these advances in the education domain, addressing the task of early student performance prediction for downstream targeted interventions and instructional support. Our experimental analysis on 23 MOOCs with millions of combined interactions over six behavioral dimensions show that models designed with our approach can (i) beat state-of-the-art educational time series baselines with no feature extraction and (ii) provide interpretable insights for personalized interventions. Source code: https://github.com/epfl-ml4ed/ripple/.
翻译:时间序列是教育预测任务中最常见的输入数据形式。绝大多数基于时间序列数据的研究依赖于专家为提升预测性能和可解释性而手工设计的特征。然而,提取这些特征对人类和计算机而言都相当费时费力。本文提出一种利用图神经网络进行不规则多变量时间序列建模的方法,与手工设计的特征相比,该方法可直接使用原始时间序列点击流数据达到相当甚至更优的准确率。此外,我们扩展了概念激活向量以实现原始时间序列模型的可解释性。我们针对教育领域中的早期学生表现预测任务分析了这些进展,该任务旨在支持后续定向干预和教学辅助。我们在涵盖六个行为维度、包含数百万次综合交互的23门大规模开放在线课程上进行实验分析,结果表明:基于本方法设计的模型(i)无需特征提取即可超越当前最优的教育时间序列基线模型,(ii)能为个性化干预提供可解释的洞见。源代码地址:https://github.com/epfl-ml4ed/ripple/。