Modern large language models (LLMs) such as GPT, Claude, and Gemini have transformed the way we learn, work, and communicate. Yet, their ability to produce highly human-like text raises serious concerns about misinformation and academic integrity, making it an urgent need for reliable algorithms to detect LLM-generated content. In this paper, we start by presenting a geometric approach to demystify rewrite-based detection algorithms, revealing their underlying rationale and demonstrating their generalization ability. Building on this insight, we introduce a novel rewrite-based detection algorithm that adaptively learns the distance between the original and rewritten text. Theoretically, we demonstrate that employing an adaptively learned distance function is more effective for detection than using a fixed distance. Empirically, we conduct extensive experiments with over 100 settings, and find that our approach demonstrates superior performance over baseline algorithms in the majority of scenarios. In particular, it achieves relative improvements from 57.8\% to 80.6\% over the strongest baseline across different target LLMs (e.g., GPT, Claude, and Gemini).
翻译:以GPT、Claude和Gemini为代表的现代大语言模型(LLMs)正在深刻改变我们的学习、工作与交流方式。然而,其生成高度类人文本的能力引发了关于虚假信息和学术诚信的严重担忧,这使得开发可靠的LLM生成内容检测算法成为迫切需求。本文首先提出一种几何方法来解构基于重写的检测算法,揭示其内在原理并论证其泛化能力。基于这一洞见,我们提出一种新颖的基于重写的检测算法,该算法能够自适应地学习原文与重写文本之间的距离。理论上,我们证明采用自适应学习的距离函数比固定距离更有利于检测任务。实证方面,我们在超过100种实验设置中进行了广泛测试,发现该方法在多数场景下均优于基线算法。特别值得注意的是,针对不同目标LLM(如GPT、Claude和Gemini),本方法相较于最强基线的相对性能提升达到了57.8%至80.6%。