We study the problem of determining whether a piece of text has been authored by a human or by a large language model (LLM). Existing state of the art logits-based detectors make use of statistics derived from the log-probability of the observed text evaluated using the distribution function of a given source LLM. However, relying solely on log probabilities can be sub-optimal. In response, we introduce AdaDetectGPT -- a novel classifier that adaptively learns a witness function from training data to enhance the performance of logits-based detectors. We provide statistical guarantees on its true positive rate, false positive rate, true negative rate and false negative rate. Extensive numerical studies show AdaDetectGPT nearly uniformly improves the state-of-the-art method in various combination of datasets and LLMs, and the improvement can reach up to 37\%. A python implementation of our method is available at https://github.com/Mamba413/AdaDetectGPT.
翻译:本文研究如何判定一段文本是由人类撰写还是由大语言模型生成。现有最先进的基于对数概率的检测器利用源自给定源LLM分布函数评估观测文本对数概率的统计量。然而,仅依赖对数概率可能并非最优选择。为此,我们提出AdaDetectGPT——一种通过自适应学习训练数据中的见证函数来增强基于对数概率检测器性能的新型分类器。我们为其真阳性率、假阳性率、真阴性率和假阴性率提供了统计保证。大量数值研究表明,AdaDetectGPT在多种数据集与LLM的组合中几乎一致地改进了当前最优方法,改进幅度最高可达37%。本方法的Python实现可在https://github.com/Mamba413/AdaDetectGPT获取。