Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.
翻译:检测现代大型语言模型生成的文本被认为具有挑战性,因为LLM和人类都可能展现出广泛而复杂的行为。然而,我们发现基于对比两个紧密相关的语言模型的评分,在区分人类生成与机器生成文本方面具有极高的准确性。基于这一机制,我们提出一种新型LLM检测器,仅需使用一对预训练LLM进行简单计算即可实现。该方法名为“双筒望远镜”(Binoculars),无需任何训练数据即可达到最先进的准确率。它能够识别来自多种现代LLM的机器文本,而无需针对特定模型进行调整。我们在多种文本来源和情境下对Binoculars进行了全面评估。在广泛的文档类型中,尽管未使用任何ChatGPT数据进行训练,该检测器仍能实现超过90%的ChatGPT(及其他LLM)生成样本的检测,且假阳性率仅为0.01%。