The increasing availability of learning activity data in Massive Open Online Courses (MOOCs) enables us to conduct a large-scale analysis of learners' learning behavior. In this paper, we analyze a dataset of 351 million learning activities from 0.8 million unique learners enrolled in over 1.6 thousand courses within two years. Specifically, we mine and identify the learning patterns of the crowd from both temporal and course enrollment perspectives leveraging mutual information theory and sequential pattern mining methods. From the temporal perspective, we find that the time intervals between consecutive learning activities of learners exhibit a mix of power-law and periodic cosine function distribution. By qualifying the relationship between course pairs, we observe that the most frequently co-enrolled courses usually fall in the same category or the same university. We demonstrate these findings can facilitate manifold applications including recommendation tasks on courses. A simple recommendation model utilizing the course enrollment patterns is competitive to the baselines with 200$\times$ faster training time.
翻译:大规模在线开放课程(MOOCs)中学习活动数据的日益丰富,使我们能够对学习者的学习行为进行大规模分析。本文分析了两年内超过1600门课程中80万独立学习者产生的3.51亿条学习活动数据。具体而言,我们运用互信息理论和序列模式挖掘方法,从时间维度和课程选课维度对群体学习模式进行了挖掘与识别。从时间维度分析发现,学习者连续学习活动的时间间隔呈现幂律分布与周期性余弦函数分布的混合特征。通过量化课程对之间的关联性,我们观察到最常被同时选修的课程通常属于相同学科类别或同一所大学。研究表明,这些发现可促进包括课程推荐在内的多种应用。一个利用课程选课模式的简单推荐模型在训练速度提升200倍的同时,其性能与基线模型具有可比性。