ChatGPT brings revolutionary social value but also raises concerns about the misuse of AI-generated content. Consequently, an important question is how to detect whether content is generated by ChatGPT or by human. Existing detectors are built upon the assumption that there are distributional gaps between human-generated and AI-generated content. These gaps are typically identified using statistical information or classifiers. Our research challenges the distributional gap assumption in detectors. We find that detectors do not effectively discriminate the semantic and stylistic gaps between human-generated and AI-generated content. Instead, the "subtle differences", such as an extra space, become crucial for detection. Based on this discovery, we propose the SpaceInfi strategy to evade detection. Experiments demonstrate the effectiveness of this strategy across multiple benchmarks and detectors. We also provide a theoretical explanation for why SpaceInfi is successful in evading perplexity-based detection. Our findings offer new insights and challenges for understanding and constructing more applicable ChatGPT detectors.
翻译:ChatGPT带来了革命性的社会价值,但也引发了人们对AI生成内容被滥用的担忧。因此,一个重要的问题是:如何检测内容是由ChatGPT生成还是由人类生成。现有检测器基于一个假设:人类生成内容和AI生成内容之间存在分布差异。这些差异通常通过统计信息或分类器来识别。我们的研究对检测器中的分布差异假设提出了挑战。我们发现检测器并不能有效区分人类生成内容和AI生成内容之间的语义和风格差异。相反,"细微差异"(例如一个多余的空格)对检测至关重要。基于这一发现,我们提出了SpaceInfi策略以逃避检测。实验表明,该策略在多个基准测试和检测器上均有效。我们还从理论上解释了SpaceInfi为何能成功逃避基于困惑度的检测。我们的发现为理解和构建更实用的ChatGPT检测器提供了新的视角和挑战。