On the Possibilities of AI-Generated Text Detection - 专知论文

会员服务 ·

0

文本检测 · 样本 · 检测器 · 信息理论 · 可行 ·

2023 年 4 月 10 日

On the Possibilities of AI-Generated Text Detection

翻译：关于AI生成文本检测可能性的研究

Souradip Chakraborty,Amrit Singh Bedi,Sicheng Zhu,Bang An,Dinesh Manocha,Furong Huang

Our work focuses on the challenge of detecting outputs generated by Large Language Models (LLMs) from those generated by humans. The ability to distinguish between the two is of utmost importance in numerous applications. However, the possibility and impossibility of such discernment have been subjects of debate within the community. Therefore, a central question is whether we can detect AI-generated text and, if so, when. In this work, we provide evidence that it should almost always be possible to detect the AI-generated text unless the distributions of human and machine generated texts are exactly the same over the entire support. This observation follows from the standard results in information theory and relies on the fact that if the machine text is becoming more like a human, we need more samples to detect it. We derive a precise sample complexity bound of AI-generated text detection, which tells how many samples are needed to detect. This gives rise to additional challenges of designing more complicated detectors that take in n samples to detect than just one, which is the scope of future research on this topic. Our empirical evaluations support our claim about the existence of better detectors demonstrating that AI-Generated text detection should be achievable in the majority of scenarios. Our results emphasize the importance of continued research in this area

翻译：本研究聚焦于区分大型语言模型生成内容与人类撰写内容的检测挑战。在众多应用场景中，区分二者的能力至关重要。然而，这种区分的可能性与局限性一直是学界争论的焦点。因此，核心问题在于：我们能否检测AI生成文本？若可检测，其条件为何？本研究表明，除非人类与机器生成文本的分布在完整支持集上完全一致，否则AI生成文本几乎总是可被检测。该结论源于信息论经典定理，其核心论据在于：若机器文本趋近人类水平，则需更多样本方能实现检测。我们推导出AI生成文本检测所需的精确样本复杂度，量化了检测所需样本数量。这一发现催生了新的研究挑战——相较于单样本检测，设计需要n个样本的更复杂检测器将成为该领域的未来研究方向。实证评估验证了更优检测器存在的论断，证明在多数场景下AI生成文本检测应可实现。研究结果凸显了持续探索该领域的重要性。

0

相关内容

文本检测

生成式对抗网络异常检测，GANs for Anomaly Detection

专知会员服务

34+阅读 · 2021年9月16日

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

专知会员服务

140+阅读 · 2020年7月10日

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

39+阅读 · 2020年5月30日

【ACL2020】生成事实验证解释，Generating Fact Checking Explanations

【ACL2020】生成事实验证解释，Generating Fact Checking Explanations

专知会员服务

17+阅读 · 2020年4月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

异常检测（Anomaly Detection）综述

异常检测（Anomaly Detection）综述

极市平台

20+阅读 · 2020年10月24日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

专知

25+阅读 · 2018年5月28日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

上百份文字的检测与识别资源，包含数据集、code和paper

上百份文字的检测与识别资源，包含数据集、code和paper

数据挖掘入门与实战

17+阅读 · 2017年12月7日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

长链非编码RNA在细颗粒物（PM2.5）诱导肺癌发生的作用与机制

国家自然科学基金

0+阅读 · 2014年12月31日

Blimp-1对小鼠allo-HSCT后GVHD发病的调控作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

体外定点糖基化修饰的内皮抑素活性短肽靶向性抗新生血管生成与抗肿瘤作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

缺氧诱导的microRNA调控TIP30表达促进胰腺癌转移作用机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Tecto调节非洲爪蛙胚层决定与分化的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

PAI-1在肿瘤新生血管形成中对血管稳定性的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

DS证据理论中的基本概率指派智能化生成及应用研究

国家自然科学基金

0+阅读 · 2009年12月31日

DCC/UNC5H2对实验性脑梗死后远隔细胞凋亡和神经可塑性的作用

国家自然科学基金

0+阅读 · 2009年12月31日

Drp-1基因在内质网应激诱导胰岛β32454;胞凋亡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

DiffusionNAG: Task-guided Neural Architecture Generation with Diffusion Models

Arxiv

0+阅读 · 2023年5月26日

GenerateCT: Text-Guided 3D Chest CT Generation

Arxiv

0+阅读 · 2023年5月26日

Improving Zero-shot Generalization and Robustness of Multi-modal Models

Improving Zero-shot Generalization and Robustness of Multi-modal Models

Arxiv

0+阅读 · 2023年5月25日

Active Learning for Natural Language Generation

Arxiv

0+阅读 · 2023年5月24日

Algorithmic Computability and Approximability of Capacity-Achieving Input Distributions

Arxiv

0+阅读 · 2023年5月23日

A Survey of Natural Language Generation

Arxiv

15+阅读 · 2021年12月22日

An Attentive Survey of Attention Models

An Attentive Survey of Attention Models

Arxiv

44+阅读 · 2020年12月15日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

1+阅读 · 今天14:49

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

1+阅读 · 今天14:47

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

1+阅读 · 今天14:45

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

2+阅读 · 今天14:22

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

3+阅读 · 今天13:50

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

2+阅读 · 今天13:33

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

2+阅读 · 今天13:30

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

2+阅读 · 今天13:28

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

2+阅读 · 今天13:13

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

专知会员服务

1+阅读 · 今天13:10

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

5+阅读 · 6月16日

多模态代码智能综述：从视觉输入到可执行代码系统

多模态代码智能综述：从视觉输入到可执行代码系统

专知会员服务

7+阅读 · 6月16日

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

专知会员服务

5+阅读 · 6月16日

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

专知会员服务

5+阅读 · 6月16日

《通用大语言模型：无人机指挥与控制接口》最新40页

《通用大语言模型：无人机指挥与控制接口》最新40页

专知会员服务

15+阅读 · 6月16日

相关VIP内容

生成式对抗网络异常检测，GANs for Anomaly Detection

专知会员服务

34+阅读 · 2021年9月16日

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

专知会员服务

140+阅读 · 2020年7月10日

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

39+阅读 · 2020年5月30日

【ACL2020】生成事实验证解释，Generating Fact Checking Explanations

【ACL2020】生成事实验证解释，Generating Fact Checking Explanations

专知会员服务

17+阅读 · 2020年4月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

学习数据的几何：形状空间分析数学综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

异常检测（Anomaly Detection）综述

异常检测（Anomaly Detection）综述

极市平台

20+阅读 · 2020年10月24日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

专知

25+阅读 · 2018年5月28日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

上百份文字的检测与识别资源，包含数据集、code和paper

上百份文字的检测与识别资源，包含数据集、code和paper

数据挖掘入门与实战

17+阅读 · 2017年12月7日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

DiffusionNAG: Task-guided Neural Architecture Generation with Diffusion Models

Arxiv

0+阅读 · 2023年5月26日

GenerateCT: Text-Guided 3D Chest CT Generation

Arxiv

0+阅读 · 2023年5月26日

Improving Zero-shot Generalization and Robustness of Multi-modal Models

Improving Zero-shot Generalization and Robustness of Multi-modal Models

Arxiv

0+阅读 · 2023年5月25日

Active Learning for Natural Language Generation

Arxiv

0+阅读 · 2023年5月24日

Algorithmic Computability and Approximability of Capacity-Achieving Input Distributions

Arxiv

0+阅读 · 2023年5月23日

A Survey of Natural Language Generation

Arxiv

15+阅读 · 2021年12月22日

An Attentive Survey of Attention Models

An Attentive Survey of Attention Models

Arxiv

44+阅读 · 2020年12月15日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

相关基金

长链非编码RNA在细颗粒物（PM2.5）诱导肺癌发生的作用与机制

国家自然科学基金

0+阅读 · 2014年12月31日

Blimp-1对小鼠allo-HSCT后GVHD发病的调控作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

体外定点糖基化修饰的内皮抑素活性短肽靶向性抗新生血管生成与抗肿瘤作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

缺氧诱导的microRNA调控TIP30表达促进胰腺癌转移作用机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Tecto调节非洲爪蛙胚层决定与分化的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

PAI-1在肿瘤新生血管形成中对血管稳定性的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

DS证据理论中的基本概率指派智能化生成及应用研究

国家自然科学基金

0+阅读 · 2009年12月31日

DCC/UNC5H2对实验性脑梗死后远隔细胞凋亡和神经可塑性的作用

国家自然科学基金

0+阅读 · 2009年12月31日

Drp-1基因在内质网应激诱导胰岛β32454;胞凋亡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员