AI生成文本检测 (AI Generated Text Detection)

The rapid development of large language models has led to an increase in AI-generated text, with students increasingly using LLM-generated content as their own work, which violates academic integrity. This paper presents an evaluation of AI text detection methods, including both traditional machine learning models and transformer-based architectures. We utilize two datasets, HC3 and DAIGT v2, to build a unified benchmark and apply a topic-based data split to prevent information leakage. This approach ensures robust generalization across unseen domains. Our experiments show that TF-IDF logistic regression achieves a reasonable baseline accuracy of 82.87%. However, deep learning models outperform it. The BiLSTM classifier achieves an accuracy of 88.86%, while DistilBERT achieves a similar accuracy of 88.11% with the highest ROC-AUC score of 0.96, demonstrating the strongest overall performance. The results indicate that contextual semantic modeling is significantly superior to lexical features and highlight the importance of mitigating topic memorization through appropriate evaluation protocols. The limitations of this work are primarily related to dataset diversity and computational constraints. In future work, we plan to expand dataset diversity and utilize parameter-efficient fine-tuning methods such as LoRA. We also plan to explore smaller or distilled models and employ more efficient batching strategies and hardware-aware optimization.

翻译：大型语言模型的快速发展导致AI生成文本日益增多，学生越来越多地将LLM生成的内容作为自己的作品使用，这违反了学术诚信。本文对AI文本检测方法进行了评估，包括传统机器学习模型和基于Transformer的架构。我们利用HC3和DAIGT v2两个数据集构建统一基准，并采用基于主题的数据划分以防止信息泄露。该方法确保了在未见领域上的鲁棒泛化能力。实验表明，TF-IDF逻辑回归达到了82.87%的合理基线准确率，但深度学习模型表现更优。BiLSTM分类器实现了88.86%的准确率，而DistilBERT达到了相近的88.11%准确率，并以0.96的最高ROC-AUC分数展现出最强的综合性能。结果表明，上下文语义建模显著优于词汇特征，并凸显了通过适当评估协议缓解主题记忆的重要性。本工作的局限性主要涉及数据集多样性和计算资源约束。在未来的工作中，我们计划扩展数据集多样性，并采用参数高效微调方法（如LoRA）。同时，我们计划探索更小或蒸馏后的模型，并采用更高效的批处理策略和硬件感知优化。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【NeurIPS2025】DNA-DetectLLM：基于 DNA 启发的“突变-修复”范式揭示 AI 生成文本

专知会员服务

12+阅读 · 2025年9月22日

文本、视觉与语音生成的自动化评估方法综述

专知会员服务

20+阅读 · 2025年6月15日

AI生成媒体检测综述：从非多模态大语言模型到多模态大语言模型

专知会员服务

17+阅读 · 2025年2月11日

【新书】使用生成式人工智能进行软件测试

专知会员服务

44+阅读 · 2025年1月6日