Detecting text generated by large language models (LLMs) is of great recent interest. With zero-shot methods like DetectGPT, detection capabilities have reached impressive levels. However, the reliability of existing detectors in real-world applications remains underexplored. In this study, we present a new benchmark, DetectRL, highlighting that even state-of-the-art (SOTA) detection techniques still underperformed in this task. We collected human-written datasets from domains where LLMs are particularly prone to misuse. Using popular LLMs, we generated data that better aligns with real-world applications. Unlike previous studies, we employed heuristic rules to create adversarial LLM-generated text, simulating advanced prompt usages, human revisions like word substitutions, and writing errors. Our development of DetectRL reveals the strengths and limitations of current SOTA detectors. More importantly, we analyzed the potential impact of writing styles, model types, attack methods, the text lengths, and real-world human writing factors on different types of detectors. We believe DetectRL could serve as an effective benchmark for assessing detectors in real-world scenarios, evolving with advanced attack methods, thus providing more stressful evaluation to drive the development of more efficient detectors. Data and code are publicly available at: https://github.com/NLP2CT/DetectRL.
翻译:检测由大语言模型(LLMs)生成的文本是近期备受关注的研究方向。借助如DetectGPT等零样本方法,检测能力已达到令人印象深刻的水平。然而,现有检测器在真实应用中的可靠性仍未得到充分探索。在本研究中,我们提出了一个新的基准测试集DetectRL,结果表明即使是最先进的(SOTA)检测技术在此任务中仍表现不佳。我们从LLMs极易被滥用的领域收集了人工撰写的数据集,并利用主流LLMs生成了更贴合实际应用场景的数据。与以往研究不同,我们采用启发式规则创建了对抗性LLM生成文本,以模拟高级提示词使用、人类修订(如词语替换)以及书写错误等情形。DetectRL的开发揭示了当前SOTA检测器的优势与局限。更重要的是,我们分析了写作风格、模型类型、攻击方法、文本长度以及真实人类写作因素对不同类型检测器的潜在影响。我们相信DetectRL能够作为评估真实场景中检测器的有效基准,并随着先进攻击方法的发展而演进,从而提供更具挑战性的评估以推动更高效检测器的开发。数据与代码已公开于:https://github.com/NLP2CT/DetectRL。