Many commercial and open-source models claim to detect machine-generated text with very high accuracy (99\% or higher). However, very few of these detectors are evaluated on shared benchmark datasets and even when they are, the datasets used for evaluation are insufficiently challenging -- lacking variations in sampling strategy, adversarial attacks, and open-source generative models. In this work we present RAID: the largest and most challenging benchmark dataset for machine-generated text detection. RAID includes over 6 million generations spanning 11 models, 8 domains, 11 adversarial attacks and 4 decoding strategies. Using RAID, we evaluate the out-of-domain and adversarial robustness of 8 open- and 4 closed-source detectors and find that current detectors are easily fooled by adversarial attacks, variations in sampling strategies, repetition penalties, and unseen generative models. We release our dataset and tools to encourage further exploration into detector robustness.
翻译:许多商业和开源模型声称能够以极高的准确率(99%或更高)检测机器生成的文本。然而,这些检测器中很少有在共享基准数据集上进行评估的,即便进行了评估,所使用的数据集也缺乏足够的挑战性——在采样策略、对抗攻击和开源生成模型方面存在不足。本文提出了RAID:迄今为止最大且最具挑战性的机器生成文本检测基准数据集。RAID包含超过600万个生成样本,涵盖11个模型、8个领域、11种对抗攻击和4种解码策略。利用RAID,我们评估了8个开源和4个闭源检测器的跨领域与对抗鲁棒性,发现当前的检测器容易被对抗攻击、采样策略变化、重复惩罚以及未见过的生成模型所欺骗。我们发布数据集和工具,以鼓励对检测器鲁棒性的进一步探索。