基于神经网络的在线教育早期多类别学生成绩精准预测 (Accurate Multi-Category Student Performance Forecasting at Early Stages of Online Education Using Neural Networks)

The ability to accurately predict and analyze student performance in online education, both at the outset and throughout the semester, is vital. Most of the published studies focus on binary classification (Fail or Pass) but there is still a significant research gap in predicting students' performance across multiple categories. This study introduces a novel neural network-based approach capable of accurately predicting student performance and identifying vulnerable students at early stages of the online courses. The Open University Learning Analytics (OULA) dataset is employed to develop and test the proposed model, which predicts outcomes in Distinction, Fail, Pass, and Withdrawn categories. The OULA dataset is preprocessed to extract features from demographic data, assessment data, and clickstream interactions within a Virtual Learning Environment (VLE). Comparative simulations indicate that the proposed model significantly outperforms existing baseline models including Artificial Neural Network Long Short Term Memory (ANN-LSTM), Random Forest (RF) 'gini', RF 'entropy' and Deep Feed Forward Neural Network (DFFNN) in terms of accuracy, precision, recall, and F1-score. The results indicate that the prediction accuracy of the proposed method is about 25% more than the existing state-of-the-art. Furthermore, compared to existing methodologies, the model demonstrates superior predictive capability across temporal course progression, achieving superior accuracy even at the initial 20% phase of course completion.

翻译：在在线教育中，准确预测和分析学生从学期开始到结束的学业表现至关重要。大多数已发表的研究集中于二元分类（不及格或及格），但在预测学生多类别成绩方面仍存在显著的研究空白。本研究提出了一种新颖的基于神经网络的方法，能够在在线课程的早期阶段准确预测学生成绩并识别弱势学生。研究采用开放大学学习分析（OULA）数据集来开发和测试所提出的模型，该模型可预测优秀、不及格、及格和退学四种类别的结果。对OULA数据集进行了预处理，以从人口统计数据、评估数据以及虚拟学习环境（VLE）内的点击流交互中提取特征。对比仿真实验表明，所提出的模型在准确率、精确率、召回率和F1分数方面显著优于现有基线模型，包括人工神经网络长短期记忆（ANN-LSTM）、基尼指数随机森林（RF 'gini'）、信息熵随机森林（RF 'entropy'）和深度前馈神经网络（DFFNN）。结果表明，所提方法的预测准确率比现有最优方法高出约25%。此外，与现有方法相比，该模型在课程时间进程上展现出更优的预测能力，即使在课程完成度仅为20%的初始阶段也能达到较高的准确率。