On the Possibilities of AI-Generated Text Detection - 专知论文

会员服务 ·

0

样本 · INFORMS · 样本复杂度 · 信息理论 · 查准率/准确率 ·

2023 年 6 月 1 日

On the Possibilities of AI-Generated Text Detection

翻译：关于AI生成文本检测的可能性

Souradip Chakraborty,Amrit Singh Bedi,Sicheng Zhu,Bang An,Dinesh Manocha,Furong Huang

Our work focuses on the challenge of detecting outputs generated by Large Language Models (LLMs) to distinguish them from those generated by humans. This ability is of the utmost importance in numerous applications. However, the possibility of such discernment has been the subject of debate within the community. Therefore, a central question is whether we can detect AI-generated text and, if so, when. In this work, we provide evidence that it should almost always be possible to detect AI-generated text unless the distributions of human and machine-generated texts are exactly the same over the entire support. This observation follows from the standard results in information theory and relies on the fact that if the machine text becomes more human-like, we need more samples to detect it. We derive a precise sample complexity bound of AI-generated text detection, which tells how many samples are needed to detect AI-generated text. This gives rise to additional challenges of designing more complicated detectors that take in $n$ samples for detection (rather than just one), which is the scope of future research on this topic. Our empirical evaluations on various real and synthetic datasets support our claim about the existence of better detectors, demonstrating that AI-generated text detection should be achievable in the majority of scenarios. Our theory and results align with OpenAI's empirical findings, (in relation to sequence length), and we are the first to provide a solid theoretical justification for these outcomes.

翻译：本研究聚焦于检测大型语言模型（LLMs）生成的输出以区分人类与机器文本的挑战。这种区分能力在众多应用中至关重要。然而，学界对此类可辨别性一直存在争议。因此，核心问题在于我们能否检测AI生成文本，以及何时能够实现。本文证明：除非人类与机器生成文本的分布在完整支撑集上完全相同，否则AI生成文本几乎总可被检测。该结论基于信息论经典结论，依赖于一个事实——当机器文本愈发接近人类文本时，需要更多样本才能实现检测。我们推导了AI生成文本检测的精确样本复杂度界，揭示了检测所需样本数量。这引出了设计更复杂检测器的新挑战——该检测器需基于n个样本（而非单一文本）进行判断，这正是该领域未来研究的范畴。我们在多种真实与合成数据集上的实证评估支持了更优检测器存在的论断，表明在绝大多数场景中AI生成文本检测具有可行性。本研究的理论推导与实验结果与OpenAI的实证发现（关于序列长度的结论）吻合，且我们首次为这些结论提供了严谨的理论依据。

0

相关内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

EcN细菌修饰聚合物纳米粒子靶向肿瘤研究

国家自然科学基金

0+阅读 · 2015年12月31日

新泛素化修饰因子对Hedgehog信号通路调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Erbin介导细胞周期异常与肿瘤发生的关系

国家自然科学基金

0+阅读 · 2012年12月31日

细胞ATP生成异常- - Warburg效应的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

长链非编码RNA HOTTIP参与小细胞肺癌耐药的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

CDC73基因异常在颌骨骨化纤维瘤发病中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

Tecto调节非洲爪蛙胚层决定与分化的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

滋养细胞合体化障碍参与恶性滋养细胞肿瘤耐药机制

国家自然科学基金

0+阅读 · 2009年12月31日

睾丸特异乳酸脱氢酶-C基因选择性剪接及其功能研究

国家自然科学基金

0+阅读 · 2009年12月31日

The Importance of Distrust in AI

The Importance of Distrust in AI

Arxiv

0+阅读 · 2023年7月25日

DataComp: In search of the next generation of multimodal datasets

Arxiv

0+阅读 · 2023年7月25日

COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts

Arxiv

0+阅读 · 2023年7月24日

Remote Bio-Sensing: Open Source Benchmark Framework for Fair Evaluation of rPPG

Arxiv

0+阅读 · 2023年7月24日

The Imitation Game: Detecting Human and AI-Generated Texts in the Era of Large Language Models

Arxiv

0+阅读 · 2023年7月22日

On Defense of the Hazard Ratio

Arxiv

0+阅读 · 2023年7月22日

On the Mechanics of NFT Valuation: AI Ethics and Social Media

Arxiv

0+阅读 · 2023年7月21日

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

Arxiv

0+阅读 · 2023年7月21日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Augmentation for small object detection

Augmentation for small object detection

Arxiv

13+阅读 · 2019年2月19日

VIP会员

文章信息

相关主题

样本复杂度

查准率/准确率

最新内容

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

0+阅读 · 17分钟前

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

0+阅读 · 20分钟前

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

0+阅读 · 22分钟前

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

0+阅读 · 37分钟前

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

专知会员服务

0+阅读 · 40分钟前

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

5+阅读 · 6月16日

多模态代码智能综述：从视觉输入到可执行代码系统

多模态代码智能综述：从视觉输入到可执行代码系统

专知会员服务

6+阅读 · 6月16日

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

专知会员服务

5+阅读 · 6月16日

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

专知会员服务

5+阅读 · 6月16日

《通用大语言模型：无人机指挥与控制接口》最新40页

《通用大语言模型：无人机指挥与控制接口》最新40页

专知会员服务

15+阅读 · 6月16日

《通过小型无人机系统将情报能力“作战化”》

《通过小型无人机系统将情报能力“作战化”》

专知会员服务

6+阅读 · 6月16日

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

专知会员服务

10+阅读 · 6月16日

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

专知会员服务

21+阅读 · 6月15日

消耗优势：美军的“精确规模化”概念

消耗优势：美军的“精确规模化”概念

专知会员服务

8+阅读 · 6月15日

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

专知会员服务

9+阅读 · 6月15日

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《短程弹道再入飞行器拦截时间中的一项异常现象》

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

从燃煤战舰到算法战争：水面指挥的永恒要求

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

The Importance of Distrust in AI

The Importance of Distrust in AI

Arxiv

0+阅读 · 2023年7月25日

DataComp: In search of the next generation of multimodal datasets

Arxiv

0+阅读 · 2023年7月25日

COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts

Arxiv

0+阅读 · 2023年7月24日

Remote Bio-Sensing: Open Source Benchmark Framework for Fair Evaluation of rPPG

Arxiv

0+阅读 · 2023年7月24日

The Imitation Game: Detecting Human and AI-Generated Texts in the Era of Large Language Models

Arxiv

0+阅读 · 2023年7月22日

On Defense of the Hazard Ratio

Arxiv

0+阅读 · 2023年7月22日

On the Mechanics of NFT Valuation: AI Ethics and Social Media

Arxiv

0+阅读 · 2023年7月21日

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

Arxiv

0+阅读 · 2023年7月21日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Augmentation for small object detection

Augmentation for small object detection

Arxiv

13+阅读 · 2019年2月19日

相关基金

EcN细菌修饰聚合物纳米粒子靶向肿瘤研究

国家自然科学基金

0+阅读 · 2015年12月31日

新泛素化修饰因子对Hedgehog信号通路调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

柽柳Dof转录因子的耐盐调控机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Erbin介导细胞周期异常与肿瘤发生的关系

国家自然科学基金

0+阅读 · 2012年12月31日

细胞ATP生成异常- - Warburg效应的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

长链非编码RNA HOTTIP参与小细胞肺癌耐药的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

CDC73基因异常在颌骨骨化纤维瘤发病中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

Tecto调节非洲爪蛙胚层决定与分化的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

滋养细胞合体化障碍参与恶性滋养细胞肿瘤耐药机制

国家自然科学基金

0+阅读 · 2009年12月31日

睾丸特异乳酸脱氢酶-C基因选择性剪接及其功能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员