Educational stakeholders are often particularly interested in sparse, delayed student outcomes, like end-of-year statewide exams. The rare occurrence of such assessments makes it harder to identify students likely to fail such assessments, as well as making it slow for researchers and educators to be able to assess the effectiveness of particular educational tools. Prior work has primarily focused on using logs from students full usage (e.g. year-long) of an educational product to predict outcomes, or considered predictive accuracy using a few minutes to predict outcomes after a short (e.g. 1 hour) session. In contrast, we investigate machine learning predictors using students' logs during their first few hours of usage can provide useful predictive insight into those students' end-of-school year external assessment. We do this on three diverse datasets: from students in Uganda using a literacy game product, and from students in the US using two mathematics intelligent tutoring systems. We consider various measures of the accuracy of the resulting predictors, including its ability to identify students at different parts along the assessment performance distribution. Our findings suggest that short-term log usage data, from 2-5 hours, can be used to provide valuable signal about students' long-term external performance.
翻译:教育利益相关者通常对稀疏且延迟的学生学业成果(如学年末的全州统考)特别关注。此类评估的罕见性使得识别可能在这些评估中表现不佳的学生变得更加困难,同时也导致研究人员和教育工作者难以快速评估特定教育工具的有效性。先前的研究主要集中于利用学生使用某一教育产品的完整(例如全年)日志数据来预测学业成果,或考虑使用几分钟的日志数据来预测短期(例如1小时)学习后的成果。与之相反,本研究探讨了利用学生最初几小时使用过程中的日志数据,通过机器学习预测模型,能否为这些学生的学年末外部评估提供有价值的预测性见解。我们在三个不同的数据集上进行了验证:包括乌干达学生使用识字游戏产品的数据,以及美国学生使用两个数学智能辅导系统的数据。我们考虑了多种预测准确性的衡量指标,包括模型识别处于评估成绩分布不同区段学生的能力。我们的研究结果表明,2至5小时的短期日志使用数据,可用于提供关于学生长期外部学业表现的有价值信号。