Large language models (LLMs) have exhibited remarkable capabilities in text generation tasks. However, the utilization of these models carries inherent risks, including but not limited to plagiarism, the dissemination of fake news, and issues in educational exercises. Although several detectors have been proposed to address these concerns, their effectiveness against adversarial perturbations, specifically in the context of student essay writing, remains largely unexplored. This paper aims to bridge this gap by constructing AIG-ASAP, an AI-generated student essay dataset, employing a range of text perturbation methods that are expected to generate high-quality essays while evading detection. Through empirical experiments, we assess the performance of current AIGC detectors on the AIG-ASAP dataset. The results reveal that the existing detectors can be easily circumvented using straightforward automatic adversarial attacks. Specifically, we explore word substitution and sentence substitution perturbation methods that effectively evade detection while maintaining the quality of the generated essays. This highlights the urgent need for more accurate and robust methods to detect AI-generated student essays in the education domain.
翻译:大型语言模型(LLMs)在文本生成任务中展现出卓越能力。然而,使用这些模型存在固有风险,包括但不限于学术剽窃、虚假新闻传播以及教育练习中的问题。尽管已有多种检测器被提出以应对这些挑战,但它们在对抗性扰动下的有效性,特别是在学生作文写作领域,仍未得到充分探索。本文旨在通过构建AIG-ASAP数据集(一个AI生成的学生作文数据集),并采用一系列预期能生成高质量作文且逃避检测的文本扰动方法,来弥合这一研究空白。通过实证实验,我们评估了当前AIGC检测器在AIG-ASAP数据集上的性能。结果表明,现有检测器极易被简单的自动对抗性攻击所规避。具体而言,我们研究了单词替换和句子替换两种扰动方法,这些方法在保持生成作文质量的同时,能有效逃避检测。这凸显了在教育领域中开发更精确、更鲁棒的AI生成学生作文检测方法的迫切需求。