We find that large language models (LLMs) are more likely to modify human-written text than AI-generated text when tasked with rewriting. This tendency arises because LLMs often perceive AI-generated text as high-quality, leading to fewer modifications. We introduce a method to detect AI-generated content by prompting LLMs to rewrite text and calculating the editing distance of the output. We dubbed our geneRative AI Detection viA Rewriting method Raidar. Raidar significantly improves the F1 detection scores of existing AI content detection models -- both academic and commercial -- across various domains, including News, creative writing, student essays, code, Yelp reviews, and arXiv papers, with gains of up to 29 points. Operating solely on word symbols without high-dimensional features, our method is compatible with black box LLMs, and is inherently robust on new content. Our results illustrate the unique imprint of machine-generated text through the lens of the machines themselves.
翻译:摘要:我们发现,大语言模型(LLMs)在执行改写任务时,更倾向于修改人类撰写的文本而非AI生成的文本。这一倾向源于LLMs常将AI生成的文本视为高质量内容,从而减少修改幅度。我们提出了一种检测AI生成内容的方法:通过提示LLMs改写文本并计算输出结果的编辑距离。我们将该方法命名为基于重写的生成式AI检测方法Raidar。Raidar显著提升了现有AI内容检测模型(包括学术模型与商业模型)在新闻、创意写作、学生论文、代码、Yelp评论及arXiv论文等多个领域的F1检测得分,最高提升达29个百分点。该方法仅基于词符符号运行,无需高维特征,可兼容黑盒LLMs,且对新内容具有固有能力鲁棒性。我们的研究结果通过机器自身的视角揭示了机器生成文本的独特印记。