This study addresses critical gaps in Automated Essay Scoring (AES) systems and Large Language Models (LLMs) with regard to their ability to effectively identify and score harmful essays. Despite advancements in AES technology, current models often overlook ethically and morally problematic elements within essays, erroneously assigning high scores to essays that may propagate harmful opinions. In this study, we introduce the Harmful Essay Detection (HED) benchmark, which includes essays integrating sensitive topics such as racism and gender bias, to test the efficacy of various LLMs in recognizing and scoring harmful content. Our findings reveal that: (1) LLMs require further enhancement to accurately distinguish between harmful and argumentative essays, and (2) both current AES models and LLMs fail to consider the ethical dimensions of content during scoring. The study underscores the need for developing more robust AES systems that are sensitive to the ethical implications of the content they are scoring.
翻译:本研究针对自动作文评分系统与大型语言模型在有效识别和评判有害文章方面存在的关键缺陷展开探讨。尽管AES技术不断进步,现有模型却常常忽视文章中存在的伦理与道德问题,错误地对可能传播有害观点的文章给予高分。本研究引入有害文章检测基准,该基准包含融合种族主义、性别偏见等敏感议题的文章,用以测试各类LLM在识别与评判有害内容方面的效能。研究发现:(1)LLM需进一步提升以准确区分有害文章与论证性文章;(2)现有AES模型与LLM在评分过程中均未能充分考虑内容的伦理维度。本研究强调,亟需开发对评分内容的伦理影响具有敏感性的、更稳健的AES系统。