Most MOOC platforms either use simple schemes for aggregating peer grades, e.g., taking the mean or the median, or apply methodologies that increase students' workload considerably, such as calibrated peer review. To reduce the error between the instructor and students' aggregated scores in the simple schemes, without requiring demanding grading calibration phases, some proposals compute specific weights to compute a weighted aggregation of the peer grades. In this work, and in contrast to most previous studies, we analyse the use of students' engagement and performance measures to compute personalized weights and study the validity of the aggregated scores produced by these common functions, mean and median, together with two other from the information retrieval field, namely the geometric and harmonic means. To test this procedure we have analysed data from a MOOC about Philosophy. The course had 1059 students registered, and 91 participated in a peer review process that consisted in writing an essay and rating three of their peers using a rubric. We calculated and compared the aggregation scores obtained using weighted and non-weighted versions. Our results show that the validity of the aggregated scores and their correlation with the instructors grades can be improved in relation to peer grading, when using the median and weights are computed according to students' performance in chapter tests.
翻译:大多数大规模开放在线课程平台采用简单的同伴评分聚合方案(如取均值或中位数),或采用显著增加学生工作负荷的方法(如校准式同伴互评)。为在无需严格评分校准阶段的前提下降低简单方案中教师评分与学生聚合评分间的误差,已有研究提出通过计算特定权重来实现同伴评分的加权聚合。本研究与多数先前研究不同,通过分析学生参与度与表现指标来计算个性化权重,并检验均值、中位数这两种常用聚合函数,以及信息检索领域中几何平均数与调和平均数这两种函数所生成聚合评分的有效性。为验证该方法,我们分析了一门哲学主题大规模开放在线课程的数据。该课程注册学生1059人,其中91人参与了包含撰写论文与使用评分标准互评三篇同伴论文的同伴互评流程。我们计算并比较了加权与非加权版本的聚合评分结果。研究表明:当采用章节测试表现计算权重并结合中位数聚合时,相较于传统同伴评分方式,聚合评分的有效性及其与教师评分的相关性均得到显著提升。