ChatGPT brings revolutionary social value but also raises concerns about the misuse of AI-generated text. Consequently, an important question is how to detect whether texts are generated by ChatGPT or by human. Existing detectors are built upon the assumption that there are distributional gaps between human-generated and AI-generated text. These gaps are typically identified using statistical information or classifiers. Our research challenges the distributional gap assumption in detectors. We find that detectors do not effectively discriminate the semantic and stylistic gaps between human-generated and AI-generated text. Instead, the "subtle differences", such as an extra space, become crucial for detection. Based on this discovery, we propose the SpaceInfi strategy to evade detection. Experiments demonstrate the effectiveness of this strategy across multiple benchmarks and detectors. We also provide a theoretical explanation for why SpaceInfi is successful in evading perplexity-based detection. And we empirically show that a phenomenon called token mutation causes the evasion for language model-based detectors. Our findings offer new insights and challenges for understanding and constructing more applicable ChatGPT detectors.
翻译:ChatGPT带来了革命性的社会价值,但也引发了对AI生成文本滥用的担忧。因此,一个重要问题是如何检测文本是由ChatGPT生成还是由人类撰写。现有检测器基于一个假设:人类生成文本与AI生成文本之间存在分布差异。这些差异通常通过统计信息或分类器来识别。我们的研究挑战了检测器中的分布差异假设。我们发现,检测器并未有效区分人类生成文本与AI生成文本之间的语义和风格差异。相反,“细微差异”(例如一个额外的空格)对检测至关重要。基于这一发现,我们提出了SpaceInfi策略来规避检测。实验表明,该策略在多个基准测试和检测器上均有效。我们还从理论上解释了为何SpaceInfi能够成功规避基于困惑度的检测,并通过实验证明,一种称为“令牌突变”的现象导致了基于语言模型的检测器失效。我们的发现为理解和构建更适用的ChatGPT检测器提供了新见解与挑战。