Diversity Boosts AI-Generated Text Detection

Detecting AI-generated text is an increasing necessity to combat misuse of LLMs in education, business compliance, journalism, and social media, where synthetic fluency can mask misinformation or deception. While prior detectors often rely on token-level likelihoods or opaque black-box classifiers, these approaches struggle against high-quality generations and offer little interpretability. In this work, we propose DivEye, a novel detection framework that captures how unpredictability fluctuates across a text using surprisal-based features. Motivated by the observation that human-authored text exhibits richer variability in lexical and structural unpredictability than LLM outputs, DivEye captures this signal through a set of interpretable statistical features. Our method outperforms existing zero-shot detectors by up to 33.2% and achieves competitive performance with fine-tuned baselines across multiple benchmarks. DivEye is robust to paraphrasing and adversarial attacks, generalizes well across domains and models, and improves the performance of existing detectors by up to 18.7% when used as an auxiliary signal. Beyond detection, DivEye provides interpretable insights into why a text is flagged, pointing to rhythmic unpredictability as a powerful and underexplored signal for LLM detection.

翻译：检测AI生成文本日益成为对抗大型语言模型在教育、商业合规、新闻业及社交媒体领域滥用的必要手段，在这些领域中合成文本的流畅性可能掩盖错误信息或欺骗行为。现有检测器多依赖词元级似然度或不透明的黑盒分类器，这些方法在面对高质量生成文本时效果欠佳，且可解释性不足。本研究提出DivEye，一种新颖的检测框架，通过基于惊异值的特征捕捉文本中不可预测性的波动规律。受人类撰写文本在词汇与结构不可预测性方面比大型语言模型输出呈现更丰富变异性的观察启发，DivEye通过一组可解释的统计特征捕获这一信号。我们的方法在多个基准测试中，相比现有零样本检测器性能提升最高达33.2%，并与微调基线模型达到竞争性表现。DivEye对文本改写和对抗攻击具有鲁棒性，能良好跨领域和跨模型泛化，作为辅助信号使用时可将现有检测器性能提升最高达18.7%。除检测功能外，DivEye为文本被标记的原因提供可解释的洞察，指出节奏性不可预测性是大型语言模型检测中强大且尚未充分探索的信号。

相关内容

关注 7104

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【NeurIPS2025】DNA-DetectLLM：基于 DNA 启发的“突变-修复”范式揭示 AI 生成文本

专知会员服务

12+阅读 · 2025年9月22日

【ICCV2025】AIGI-Holmes：面向可解释性与可泛化性的AI生成图像检测方法 —— 基于多模态大语言模型的研究

专知会员服务

10+阅读 · 2025年7月4日

《内容凭证：加强生成式人工智能时代的多媒体完整性》最新25页报告

专知会员服务

19+阅读 · 2025年3月4日

AI生成媒体检测综述：从非多模态大语言模型到多模态大语言模型

专知会员服务

17+阅读 · 2025年2月11日