Recently there have been many shared tasks targeting the detection of generated text from Large Language Models (LLMs). However, these shared tasks tend to focus either on cases where text is limited to one particular domain or cases where text can be from many domains, some of which may not be seen during test time. In this shared task, using the newly released RAID benchmark, we aim to answer whether or not models can detect generated text from a large, yet fixed, number of domains and LLMs, all of which are seen during training. Over the course of three months, our task was attempted by 9 teams with 23 detector submissions. We find that multiple participants were able to obtain accuracies of over 99% on machine-generated text from RAID while maintaining a 5% False Positive Rate -- suggesting that detectors are able to robustly detect text from many domains and models simultaneously. We discuss potential interpretations of this result and provide directions for future research.
翻译:近年来,针对大型语言模型生成文本的检测已出现众多共享任务。然而,这些共享任务往往侧重于以下两种情形:要么文本局限于单一特定领域,要么文本可来自多个领域,其中部分领域可能在测试时未被覆盖。本共享任务采用新发布的RAID基准,旨在探究模型能否检测来自大量但固定数量的领域及LLM的生成文本——所有这些领域和模型均在训练阶段可见。在为期三个月的任务周期中,共有9支团队参与并提交了23个检测器方案。研究发现,多位参与者在保持5%误报率的前提下,对RAID基准中的机器生成文本实现了超过99%的检测准确率——这表明检测器能够同时稳健地识别来自多领域、多模型的生成文本。本文讨论了该结果的潜在解释,并为未来研究方向提供了建议。