The rapid proliferation of Large Language Models has significantly increased the difficulty of distinguishing between human-written and AI generated texts, raising critical issues across academic, editorial, and social domains. This paper investigates the problem of AI generated text detection through the design, implementation, and comparative evaluation of multiple machine learning based detectors. Four neural architectures are developed and analyzed: a Multilayer Perceptron, a one-dimensional Convolutional Neural Network, a MobileNet-based CNN, and a Transformer model. The proposed models are benchmarked against widely used online detectors, including ZeroGPT, GPTZero, QuillBot, Originality.AI, Sapling, IsGen, Rephrase, and Writer. Experiments are conducted on the COLING Multilingual Dataset, considering both English and Italian configurations, as well as on an original thematic dataset focused on Art and Mental Health. Results show that supervised detectors achieve more stable and robust performance than commercial tools across different languages and domains, highlighting key strengths and limitations of current detection strategies.
翻译:随着大型语言模型的迅速普及,区分人类撰写文本与AI生成文本的难度显著增加,这引发了学术、编辑及社会领域的关键问题。本文通过设计、实现并比较评估多种基于机器学习的检测器,研究了AI生成文本的检测问题。研究开发并分析了四种神经架构:多层感知器、一维卷积神经网络、基于MobileNet的CNN以及Transformer模型。所提出的模型与广泛使用的在线检测器(包括ZeroGPT、GPTZero、QuillBot、Originality.AI、Sapling、IsGen、Rephrase和Writer)进行了基准测试。实验基于COLING多语言数据集进行,考虑了英语和意大利语配置,同时使用了一个专注于艺术与心理健康的原创专题数据集。结果表明,与商业工具相比,监督式检测器在不同语言和领域上表现出更稳定、更鲁棒的性能,揭示了当前检测策略的主要优势与局限性。