Automated essay scoring (AES) is a useful tool in English as a Foreign Language (EFL) writing education, offering real-time essay scores for students and instructors. However, previous AES models were trained on essays and scores irrelevant to the practical scenarios of EFL writing education and usually provided a single holistic score due to the lack of appropriate datasets. In this paper, we release DREsS, a large-scale, standard dataset for rubric-based automated essay scoring. DREsS comprises three sub-datasets: DREsS_New, DREsS_Std., and DREsS_CASE. We collect DREsS_New, a real-classroom dataset with 2.3K essays authored by EFL undergraduate students and scored by English education experts. We also standardize existing rubric-based essay scoring datasets as DREsS_Std. We suggest CASE, a corruption-based augmentation strategy for essays, which generates 40.1K synthetic samples of DREsS_CASE and improves the baseline results by 45.44%. DREsS will enable further research to provide a more accurate and practical AES system for EFL writing education.
翻译:自动论文评分(AES)是英语作为外语(EFL)写作教育中的一项有用工具,能为学生和教师提供实时的论文分数。然而,以往的AES模型是在与EFL写作教育实际场景无关的论文和分数上训练的,并且由于缺乏合适的数据集,通常只提供一个单一的整体分数。在本文中,我们发布了DREsS,一个用于基于评分标准的自动论文评分的大规模标准数据集。DREsS包含三个子数据集:DREsS_New、DREsS_Std. 和 DREsS_CASE。我们收集了DREsS_New,这是一个真实课堂数据集,包含2.3K篇由EFL本科生撰写并由英语教育专家评分的论文。我们还将现有的基于评分标准的论文评分数据集标准化为DREsS_Std.。我们提出了CASE,一种基于破坏的论文增强策略,该策略生成了40.1K个DREsS_CASE的合成样本,并将基线结果提升了45.44%。DREsS将推动进一步研究,为EFL写作教育提供更准确、更实用的AES系统。