Detecting AI-generated text is an increasing necessity to combat misuse of LLMs in education, business compliance, journalism, and social media, where synthetic fluency can mask misinformation or deception. While prior detectors often rely on token-level likelihoods or opaque black-box classifiers, these approaches struggle against high-quality generations and offer little interpretability. In this work, we propose DivEye, a novel detection framework that captures how unpredictability fluctuates across a text using surprisal-based features. Motivated by the observation that human-authored text exhibits richer variability in lexical and structural unpredictability than LLM outputs, DivEye captures this signal through a set of interpretable statistical features. Our method outperforms existing zero-shot detectors by up to 33.2% and achieves competitive performance with fine-tuned baselines across multiple benchmarks. DivEye is robust to paraphrasing and adversarial attacks, generalizes well across domains and models, and improves the performance of existing detectors by up to 18.7% when used as an auxiliary signal. Beyond detection, DivEye provides interpretable insights into why a text is flagged, pointing to rhythmic unpredictability as a powerful and underexplored signal for LLM detection.
翻译:检测AI生成文本日益成为对抗大型语言模型在教育、商业合规、新闻业及社交媒体领域滥用的必要手段,在这些领域中合成文本的流畅性可能掩盖错误信息或欺骗行为。现有检测器多依赖词元级似然度或不透明的黑盒分类器,这些方法在面对高质量生成文本时效果欠佳,且可解释性不足。本研究提出DivEye,一种新颖的检测框架,通过基于惊异值的特征捕捉文本中不可预测性的波动规律。受人类撰写文本在词汇与结构不可预测性方面比大型语言模型输出呈现更丰富变异性的观察启发,DivEye通过一组可解释的统计特征捕获这一信号。我们的方法在多个基准测试中,相比现有零样本检测器性能提升最高达33.2%,并与微调基线模型达到竞争性表现。DivEye对文本改写和对抗攻击具有鲁棒性,能良好跨领域和跨模型泛化,作为辅助信号使用时可将现有检测器性能提升最高达18.7%。除检测功能外,DivEye为文本被标记的原因提供可解释的洞察,指出节奏性不可预测性是大型语言模型检测中强大且尚未充分探索的信号。