Early Detection of At-Risk Students Using Machine Learning

This research presents preliminary work to address the challenge of identifying at-risk students using supervised machine learning and three unique data categories: engagement, demographics, and performance data collected from Fall 2023 using Canvas and the California State University, Fullerton dashboard. We aim to tackle the persistent challenges of higher education retention and student dropout rates by screening for at-risk students and building a high-risk identification system. By focusing on previously overlooked behavioral factors alongside traditional metrics, this work aims to address educational gaps, enhance student outcomes, and significantly boost student success across disciplines at the University. Pre-processing steps take place to establish a target variable, anonymize student information, manage missing data, and identify the most significant features. Given the mixed data types in the datasets and the binary classification nature of this study, this work considers several machine learning models, including Support Vector Machines (SVM), Naive Bayes, K-nearest neighbors (KNN), Decision Trees, Logistic Regression, and Random Forest. These models predict at-risk students and identify critical periods of the semester when student performance is most vulnerable. We will use validation techniques such as train test split and k-fold cross-validation to ensure the reliability of the models. Our analysis indicates that all algorithms generate an acceptable outcome for at-risk student predictions, while Naive Bayes performs best overall.

翻译：本研究提出了一项初步工作，旨在利用监督式机器学习方法，结合三类独特的数据类别——参与度、人口统计学和学业表现数据（通过Canvas平台和加州州立大学富勒顿分校仪表板于2023年秋季采集），应对识别风险学生的挑战。我们致力于通过筛查风险学生并构建高风险识别系统，以应对高等教育留存率和学生辍学率这一长期难题。通过关注传统指标之外以往被忽视的行为因素，本研究旨在弥补教育评估的空白，改善学生学业表现，并显著提升大学跨学科学生的学业成就。研究实施了数据预处理步骤，包括建立目标变量、匿名化学生信息、处理缺失数据以及识别最显著的特征。考虑到数据集中混合的数据类型以及本研究的二分类性质，我们评估了多种机器学习模型，包括支持向量机（SVM）、朴素贝叶斯、K近邻（KNN）、决策树、逻辑回归和随机森林。这些模型用于预测风险学生，并识别学期中学生表现最脆弱的关键时段。我们将采用训练测试分割和K折交叉验证等验证技术以确保模型的可靠性。分析结果表明，所有算法在风险学生预测中均能产生可接受的结果，其中朴素贝叶斯模型整体表现最优。

相关内容

Machine Learning

关注 2249

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日