In this paper, we study the problem of detecting machine-generated text when the large language model (LLM) it is possibly derived from is unknown. We do so by apply ensembling methods to the outputs from DetectGPT classifiers (Mitchell et al. 2023), a zero-shot model for machine-generated text detection which is highly accurate when the generative (or base) language model is the same as the discriminative (or scoring) language model. We find that simple summary statistics of DetectGPT sub-model outputs yield an AUROC of 0.73 (relative to 0.61) while retaining its zero-shot nature, and that supervised learning methods sharply boost the accuracy to an AUROC of 0.94 but require a training dataset. This suggests the possibility of further generalisation to create a highly-accurate, model-agnostic machine-generated text detector.
翻译:本文研究了在可能生成文本的大型语言模型(LLM)未知的情况下,检测机器生成文本的问题。我们通过将集成方法应用于DetectGPT分类器(Mitchell等人,2023)的输出实现这一目标。DetectGPT是一种零样本的机器生成文本检测模型,当生成(或基础)语言模型与判别(或评分)语言模型相同时,其检测准确率极高。我们发现,对DetectGPT子模型输出进行简单的统计汇总即可获得0.73的AUROC(相对基准0.61),同时保持其零样本特性;而采用监督学习方法可将准确率显著提升至0.94的AUROC,但需要训练数据集。这表明通过进一步泛化,有可能构建出高精度、模型无关的机器生成文本检测器。