Students who take an online course, such as a MOOC, use the course's discussion forum to ask questions or reach out to instructors when encountering an issue. However, reading and responding to students' questions is difficult to scale because of the time needed to consider each message. As a result, critical issues may be left unresolved, and students may lose the motivation to continue in the course. To help address this problem, we build predictive models that automatically determine the urgency of each forum post, so that these posts can be brought to instructors' attention. This paper goes beyond previous work by predicting not just a binary decision cut-off but a post's level of urgency on a 7-point scale. First, we train and cross-validate several models on an original data set of 3,503 posts from MOOCs at University of Pennsylvania. Second, to determine the generalizability of our models, we test their performance on a separate, previously published data set of 29,604 posts from MOOCs at Stanford University. While the previous work on post urgency used only one data set, we evaluated the prediction across different data sets and courses. The best-performing model was a support vector regressor trained on the Universal Sentence Encoder embeddings of the posts, achieving an RMSE of 1.1 on the training set and 1.4 on the test set. Understanding the urgency of forum posts enables instructors to focus their time more effectively and, as a result, better support student learning.
翻译:选修在线课程(如MOOC)的学生在遇到问题时,会通过课程讨论论坛提问或联系教师。然而,阅读并回复学生提问因需要逐一考量每条信息而难以规模化,导致关键问题可能悬而未决,学生也可能因此失去继续课程的积极性。为解决该问题,我们构建了能自动判定每篇论坛帖子紧急程度的预测模型,从而帮助教师优先关注这些帖子。本文超越了以往仅进行二元判定阈值划分的研究,实现了将帖子紧急程度划分为7个等级。首先,我们在包含宾夕法尼亚大学MOOC课程3503篇帖子的原始数据集上训练并交叉验证了多个模型。其次,为检验模型的泛化能力,我们在斯坦福大学MOOC课程先前发布的29604篇帖子的独立数据集上测试其性能。尽管此前关于帖子紧急程度的研究仅使用单一数据集,我们则跨不同数据集与课程评估了预测效果。表现最佳的模型为基于通用句子编码器嵌入向量训练的支持向量回归模型,其在训练集上的均方根误差为1.1,测试集上为1.4。理解论坛帖子的紧急程度使教师能更高效地分配精力,从而更好地支持学生学习。