Neural based approaches to automatic evaluation of subjective responses have shown superior performance and efficiency compared to traditional rule-based and feature engineering oriented solutions. However, it remains unclear whether the suggested neural solutions are sufficient replacements of human raters as we find recent works do not properly account for rubric items that are essential for automated essay scoring during model training and validation. In this paper, we propose a series of data augmentation operations that train and test an automated scoring model to learn features and functions overlooked by previous works while still achieving state-of-the-art performance in the Automated Student Assessment Prize dataset.
翻译:基于神经网络的主观回答自动评估方法相比传统的基于规则和特征工程的解决方案,展现出更优越的性能和效率。然而,目前尚不清楚所提出的神经解决方案是否足以替代人类评分者,因为我们发现近期的工作在模型训练和验证过程中未能恰当考虑对自动作文评分至关重要的评分标准项目。本文提出了一系列数据增强操作,用于训练和测试自动评分模型,使其能够学习以往工作忽略的特征与功能,同时在"自动学生评估奖"数据集上仍能达到最先进的性能。