AI humanizers are a new class of online software tools meant to paraphrase and rewrite AI-generated text in a way that allows them to evade AI detection software. We study 19 AI humanizer and paraphrasing tools and qualitatively assess their effects and faithfulness in preserving the meaning of the original text. We show that many existing AI detectors fail to detect humanized text. Finally, we demonstrate a robust model that can detect humanized AI text while maintaining a low false positive rate using a data-centric augmentation approach. We attack our own detector, training our own fine-tuned model optimized against our detector's predictions, and show that our detector's cross-humanizer generalization is sufficient to remain robust to this attack.
翻译:AI人性化工具是一类新型在线软件工具,旨在通过改写和重述AI生成的文本,使其能够规避AI检测软件的识别。本研究对19种AI人性化与文本改写工具进行了系统考察,并定性评估了它们在保留原文含义方面的效果与忠实度。实验表明,现有多数AI检测器难以有效识别经过人性化处理的文本。最后,我们提出一种基于数据增强的鲁棒检测模型,该模型在保持较低误报率的同时,能够准确识别人性化处理的AI文本。我们通过训练针对本检测器预测结果优化的微调模型对本检测器实施攻击,实验证明本检测器具备跨人性化工具的泛化能力,能够有效抵御此类攻击。