Writing is a foundational literacy skill that underpins effective communication, fosters critical thinking, facilitates learning across disciplines, and enables individuals to organize and articulate complex ideas. Consequently, writing assessment plays a vital role in evaluating language proficiency, communicative effectiveness, and analytical reasoning. The rapid advancement of large language models (LLMs) has made it increasingly easy to generate coherent, high-quality essays, raising significant concerns about the authenticity of student-submitted work. This chapter first provides an overview of the current landscape of detectors for AI-generated and AI-assisted essays, along with guidelines for their responsible use. It then presents empirical analyses to evaluate how well detectors trained on essays from one LLM generalize to identifying essays produced by other LLMs, based on essays generated in response to public GRE writing prompts. These findings provide guidance for developing and retraining detectors for practical applications.
翻译:写作是基础读写能力的核心技能,它支撑有效沟通、促进批判性思维、助力跨学科学习,并使个体能够组织与阐述复杂观点。因此,写作评估在衡量语言熟练度、交际效能及分析推理能力方面具有至关重要的作用。大语言模型(LLMs)的快速发展使得生成连贯、高质量论文变得日益容易,这引发了关于学生提交作业真实性的重大关切。本章首先综述了当前针对AI生成与AI辅助论文的检测工具发展现状,并提出了负责任使用的指导原则。随后基于公开GRE写作题目生成的论文,通过实证分析评估了在单一LLM生成论文上训练的检测器在识别其他LLM生成论文时的泛化性能。这些发现为实际应用中检测器的开发与再训练提供了指导依据。