A Predictive Model using Machine Learning Algorithm in Identifying Students Probability on Passing Semestral Course

This study aims to determine a predictive model to learn students probability to pass their courses taken at the earliest stage of the semester. To successfully discover a good predictive model with high acceptability, accurate, and precision rate which delivers a useful outcome for decision making in education systems, in improving the processes of conveying knowledge and uplifting students academic performance, the proponent applies and strictly followed the CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology. This study employs classification for data mining techniques, and decision tree for algorithm. With the utilization of the newly discovered predictive model, the prediction of students probabilities to pass the current courses they take gives 0.7619 accuracy, 0.8333 precision, 0.8823 recall, and 0.8571 f1 score, which shows that the model used in the prediction is reliable, accurate, and recommendable. Considering the indicators and the results, it can be noted that the prediction model used in this study is highly acceptable. The data mining techniques provides effective and efficient innovative tools in analyzing and predicting student performances. The model used in this study will greatly affect the way educators understand and identify the weakness of their students in the class, the way they improved the effectiveness of their learning processes gearing to their students, bring down academic failure rates, and help institution administrators modify their learning system outcomes. Further study for the inclusion of some students demographic information, vast amount of data within the dataset, automated and manual process of predictive criteria indicators where the students can regulate to which criteria, they must improve more for them to pass their courses taken at the end of the semester as early as midterm period are highly needed.

翻译：本研究旨在构建一种预测模型，用于在学期初期阶段评估学生通过所选课程的概率。为成功发现一个具有高可接受性、准确率及精确率的优质预测模型，从而为教育系统的决策过程提供有效成果，改进知识传递流程并提升学生学业表现，研究者严格遵循并采用了CRISP-DM（跨行业数据挖掘标准流程）方法论。本研究采用数据挖掘技术中的分类方法，并以决策树作为算法。通过应用新发现的预测模型，对学生当前课程通过概率的预测准确率为0.7619，精确率为0.8333，召回率为0.8823，F1得分为0.8571，这表明该预测模型具有可靠性、准确性及可推荐性。综合各项指标与结果可知，本研究所采用的预测模型具有高度可接受性。数据挖掘技术为分析与预测学生表现提供了高效创新的工具。本研究所用模型将深刻影响教育者理解与识别学生在课堂中的薄弱环节的方式，改进针对学生的教学流程有效性，降低学业失败率，并协助院校管理者优化学习系统成果。未来研究亟需纳入学生人口统计学信息、扩大数据集的规模，并建立预测指标体系的自动化与人工化流程——使学生能够自主调节需改进的指标，以便在学期中期阶段即可提前规划以通过期末课程。