The efficacy of an large language model (LLM) generated text detector depends substantially on the availability of sizable training data. White-box zero-shot detectors, which require no such data, are nonetheless limited by the accessibility of the source model of the LLM-generated text. In this paper, we propose an simple but effective black-box zero-shot detection approach, predicated on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts. This approach entails computing the Grammar Error Correction Score (GECScore) for the given text to distinguish between human-written and LLM-generated text. Extensive experimental results show that our method outperforms current state-of-the-art (SOTA) zero-shot and supervised methods, achieving an average AUROC of 98.7% and showing strong robustness against paraphrase and adversarial perturbation attacks.
翻译:大型语言模型(LLM)生成文本检测器的有效性在很大程度上依赖于大量训练数据的可用性。白盒零样本检测器无需此类数据,但仍受限于LLM生成文本的源模型可访问性。本文提出一种简单而有效的黑盒零样本检测方法,基于人类撰写文本通常比LLM生成文本包含更多语法错误的观察。该方法通过计算给定文本的语法纠错分数(GECScore)来区分人类撰写文本与LLM生成文本。大量实验结果表明,我们的方法优于当前最先进的零样本和监督方法,平均AUROC达到98.7%,并对释义和对抗扰动攻击表现出强鲁棒性。