LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts

With the emergence of widely available powerful LLMs, disinformation generated by large Language Models (LLMs) has become a major concern. Historically, LLM detectors have been touted as a solution, but their effectiveness in the real world is still to be proven. In this paper, we focus on an important setting in information operations -- short news-like posts generated by moderately sophisticated attackers. We demonstrate that existing LLM detectors, whether zero-shot or purpose-trained, are not ready for real-world use in that setting. All tested zero-shot detectors perform inconsistently with prior benchmarks and are highly vulnerable to sampling temperature increase, a trivial attack absent from recent benchmarks. A purpose-trained detector generalizing across LLMs and unseen attacks can be developed, but it fails to generalize to new human-written texts. We argue that the former indicates domain-specific benchmarking is needed, while the latter suggests a trade-off between the adversarial evasion resilience and overfitting to the reference human text, with both needing evaluation in benchmarks and currently absent. We believe this suggests a re-consideration of current LLM detector benchmarking approaches and provides a dynamically extensible benchmark to allow it (https://github.com/Reliable-Information-Lab-HEVS/benchmark_llm_texts_detection).

翻译：随着功能强大的大型语言模型（LLM）的广泛普及，由LLM生成的虚假信息已成为一个主要关切。历史上，LLM检测器曾被吹捧为解决方案，但其在现实世界中的有效性仍有待验证。本文聚焦于信息操作中的一个重要场景——由中等复杂程度的攻击者生成的短新闻式帖子。我们证明，现有的LLM检测器（无论是零样本检测器还是专门训练的检测器）在该场景下均未准备好投入实际应用。所有测试的零样本检测器均与先前基准测试的表现不一致，且极易受到采样温度升高这一简单攻击的影响（该攻击在近期基准测试中未被纳入）。虽然可以开发出能够跨LLM和未见攻击泛化的专门训练检测器，但其无法泛化到新的人类撰写文本。我们认为，前者表明需要领域特定的基准测试，而后者则揭示了对抗性规避鲁棒性与对参考人类文本过拟合之间的权衡——这两者均需在基准测试中进行评估，但目前均未涵盖。我们相信，这提示需要重新审视当前LLM检测器的基准测试方法，并为此提供了一个动态可扩展的基准测试框架以支持相关研究（https://github.com/Reliable-Information-Lab-HEVS/benchmark_llm_texts_detection）。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日