From Automation to Augmentation: Large Language Models Elevating Essay Scoring Landscape

Receiving immediate and personalized feedback is crucial for second-language learners, and Automated Essay Scoring (AES) systems are a vital resource when human instructors are unavailable. This study investigates the effectiveness of Large Language Models (LLMs), specifically GPT-4 and fine-tuned GPT-3.5, as tools for AES. Our comprehensive set of experiments, conducted on both public and private datasets, highlights the remarkable advantages of LLM-based AES systems. They include superior accuracy, consistency, generalizability, and interpretability, with fine-tuned GPT-3.5 surpassing traditional grading models. Additionally, we undertake LLM-assisted human evaluation experiments involving both novice and expert graders. One pivotal discovery is that LLMs not only automate the grading process but also enhance the performance of human graders. Novice graders when provided with feedback generated by LLMs, achieve a level of accuracy on par with experts, while experts become more efficient and maintain greater consistency in their assessments. These results underscore the potential of LLMs in educational technology, paving the way for effective collaboration between humans and AI, ultimately leading to transformative learning experiences through AI-generated feedback.

翻译：即时且个性化的反馈对第二语言学习者至关重要，而自动作文评分（AES）系统在缺乏人工指导时是一种重要资源。本研究探讨了大语言模型（LLMs），特别是GPT-4和微调版GPT-3.5，作为AES工具的有效性。我们在公开和私有数据集上进行的一系列全面实验，突显了基于LLM的AES系统的显著优势，包括卓越的准确性、一致性、泛化能力和可解释性，其中微调版GPT-3.5超越了传统评分模型。此外，我们开展了涉及新手和专家评分者的LLM辅助人工评估实验。一个关键发现是，LLM不仅能自动化评分过程，还能提升人类评分者的表现。当新手评分者获得LLM生成的反馈时，其准确性达到与专家相当的水平；而专家评分者则变得更加高效，并在评估中保持更高的一致性。这些结果凸显了LLM在教育技术中的潜力，为人机有效协作铺平了道路，最终通过AI生成的反馈实现变革性的学习体验。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日