With the rapidly increasing application of large language models (LLMs), their abuse has caused many undesirable societal problems such as fake news, academic dishonesty, and information pollution. This makes AI-generated text (AIGT) detection of great importance. Among existing methods, white-box methods are generally superior to black-box methods in terms of performance and generalizability, but they require access to LLMs' internal states and are not applicable to black-box settings. In this paper, we propose to estimate word generation probabilities as pseudo white-box features via multiple re-sampling to help improve AIGT detection under the black-box setting. Specifically, we design POGER, a proxy-guided efficient re-sampling method, which selects a small subset of representative words (e.g., 10 words) for performing multiple re-sampling in black-box AIGT detection. Experiments on datasets containing texts from humans and seven LLMs show that POGER outperforms all baselines in macro F1 under black-box, partial white-box, and out-of-distribution settings and maintains lower re-sampling costs than its existing counterparts.
翻译:随着大语言模型(LLMs)应用的快速增长,其滥用已引发诸多不良社会问题,如假新闻、学术不端及信息污染。这使得AI生成文本(AIGT)检测变得至关重要。现有方法中,白盒方法在性能和泛化能力上普遍优于黑盒方法,但此类方法需访问LLMs的内部状态,不适用于黑盒场景。本文提出通过多次重采样来估计词生成概率作为伪白盒特征,以辅助改进黑盒设置下的AIGT检测。具体而言,我们设计了POGER——一种代理引导的高效重采样方法,该方法从文本中选取少量代表性词汇(如10个词)进行多次重采样,以支撑黑盒AIGT检测。在包含人类文本及七种LLMs生成文本的数据集上的实验表明,POGER在黑盒、部分白盒及分布外场景下的宏F1指标上均优于所有基线方法,且其重采样成本显著低于现有同类方法。