Many commercial and open-source models claim to detect machine-generated text with extremely high accuracy (99% or more). However, very few of these detectors are evaluated on shared benchmark datasets and even when they are, the datasets used for evaluation are insufficiently challenging-lacking variations in sampling strategy, adversarial attacks, and open-source generative models. In this work we present RAID: the largest and most challenging benchmark dataset for machine-generated text detection. RAID includes over 6 million generations spanning 11 models, 8 domains, 11 adversarial attacks and 4 decoding strategies. Using RAID, we evaluate the out-of-domain and adversarial robustness of 8 open- and 4 closed-source detectors and find that current detectors are easily fooled by adversarial attacks, variations in sampling strategies, repetition penalties, and unseen generative models. We release our data along with a leaderboard to encourage future research.
翻译:许多商业和开源模型声称能够以极高准确率(99%以上)检测机器生成的文本。然而,这些检测器极少在共享基准数据集上接受评估,即便有所评估,所使用的数据集也缺乏足够的挑战性——未能涵盖采样策略的多样性、对抗性攻击以及开源生成模型的变体。本文提出了RAID:当前规模最大、最具挑战性的机器生成文本检测基准数据集。RAID包含超过600万个生成样本,涵盖11种模型、8个领域、11种对抗性攻击和4种解码策略。基于RAID,我们评估了8个开源和4个闭源检测器的跨领域与对抗鲁棒性,发现现有检测器极易被对抗性攻击、采样策略变化、重复惩罚机制以及未见过的生成模型所欺骗。我们公开发布了该数据集及排行榜,以推动未来相关研究。