Early Prediction of Student Performance Using Bayesian Updating with Informative Priors Across Cohorts

Early identification of at risk students in higher education depends on predictive models that maintain accuracy across successive cohorts -- a requirement that single-cohort modeling approaches fail to meet. This study evaluates Bayesian updating with informative priors from a previous cohort to improve cross-cohort prediction robustness using digital trace data. We fit weekly Bayesian linear, logistic, and ordinal regression models with either uninformative default priors or informative priors derived from posterior distributions of a preceding cohort. Models were applied to six weekly self-regulated learning (SRL)-aligned engagement indicators from two consecutive cohorts of students in a blended first-year mathematics course (N1 = 307; N2 = 323). Outcomes were exam points, final grades, and a binary at risk indicator. The models were evaluated weekly based on accuracy, sensitivity, and RMSE. In the source cohort, performance was already substantial by week 6. In the target cohort, informative priors improved early classification: Logistic models with priors reduced misclassification by 22% and false negatives by 38% in week 3 relative to the uninformative default. Ordinal models with priors similarly showed the strongest improvements in early weeks, reducing misclassification by 42% in week 2 and reaching an accuracy of .77 by week 4. Linear models showed little benefit from prior information. These findings demonstrate that Bayesian updating is a viable method for improving early classification performance across cohorts, with gains concentrated in the early weeks of the semester when current-cohort data are scarce.

翻译：高等教育中早期识别学业风险学生依赖于能跨连续队列保持准确性的预测模型——这一需求是单队列建模方法无法满足的。本研究评估了采用前一个队列信息先验的贝叶斯更新方法，利用数字轨迹数据提升跨队列预测的鲁棒性。我们分别以无信息默认先验或基于前序队列后验分布的信息先验，拟合了每周的贝叶斯线性、逻辑斯蒂和序数回归模型。模型应用于某混合式大学数学课程连续两个队列学生（N1=307；N2=323）的六项每周自调节学习（SRL）对齐的参与度指标。结果变量包括考试分数、最终成绩及二分类风险指标。每周基于准确率、敏感性和均方根误差评估模型。在源队列中，模型至第六周时已展现显著性能。在目标队列中，信息先验改善了早期分类：相较无信息默认先验，采用先验的逻辑斯蒂模型在第三周将误分类率降低22%，假阴性率降低38%。带先验的序数模型同样在早期周次表现出最强改善，第二周误分类率降低42%，第四周准确率达0.77。线性模型则几乎未从先验信息中获益。这些发现表明，贝叶斯更新是改善跨队列早期分类性能的可行方法，其增益主要集中于当前队列数据稀缺的学期初几周。