The potentials of Generative-AI technologies like Large Language models (LLMs) to revolutionize education are undermined by ethical considerations around their misuse which worsens the problem of academic dishonesty. LLMs like GPT-4 and Llama 2 are becoming increasingly powerful in generating sophisticated content and answering questions, from writing academic essays to solving complex math problems. Students are relying on these LLMs to complete their assignments and thus compromising academic integrity. Solutions to detect LLM-generated text are compute-intensive and often lack generalization. This paper presents a novel approach for detecting LLM-generated AI-text using a visual representation of word embedding. We have formulated a novel Convolutional Neural Network called ZigZag ResNet, as well as a scheduler for improving generalization, named ZigZag Scheduler. Through extensive evaluation using datasets of text generated by six different state-of-the-art LLMs, our model demonstrates strong intra-domain and inter-domain generalization capabilities. Our best model detects AI-generated text with an impressive average detection rate (over inter- and intra-domain test data) of 88.35%. Through an exhaustive ablation study, our ZigZag ResNet and ZigZag Scheduler provide a performance improvement of nearly 4% over the vanilla ResNet. The end-to-end inference latency of our model is below 2.5ms per sentence. Our solution offers a lightweight, computationally efficient, and faster alternative to existing tools for AI-generated text detection, with better generalization performance. It can help academic institutions in their fight against the misuse of LLMs in academic settings. Through this work, we aim to contribute to safeguarding the principles of academic integrity and ensuring the trustworthiness of student work in the era of advanced LLMs.
翻译:大型语言模型(LLM)等生成式人工智能技术革新教育的潜力,因其滥用所引发的伦理问题而受到削弱,这加剧了学术不端行为的严重性。GPT-4和Llama 2等LLM在生成复杂内容与解答问题方面日益强大,涵盖从撰写学术论文到解决复杂数学问题的广泛任务。学生正依赖这些LLM完成作业,从而损害了学术诚信。现有检测LLM生成文本的方案通常计算密集且泛化能力不足。本文提出一种利用词嵌入视觉表征来检测LLM生成AI文本的新方法。我们设计了一种名为ZigZag ResNet的新型卷积神经网络,以及一种用于提升泛化能力的调度器——ZigZag Scheduler。通过使用六种不同前沿LLM生成的文本数据集进行广泛评估,我们的模型展现出强大的域内与跨域泛化能力。最佳模型在AI生成文本检测中取得了88.35%的平均检测率(基于跨域与域内测试数据)。详尽的消融实验表明,我们的ZigZag ResNet与ZigZag Scheduler相较原始ResNet实现了近4%的性能提升。模型端到端推理延迟低于每句2.5毫秒。本方案为现有AI生成文本检测工具提供了一种轻量化、计算高效且更快速的替代方案,同时具备更优的泛化性能,可助力学术机构应对LLM在学术场景中的滥用问题。通过此项研究,我们旨在为维护学术诚信原则、保障高级LLM时代学生作业的可信度贡献力量。