Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

Weixin Liang,Zachary Izzo,Yaohui Zhang,Haley Lepp,Hancheng Cao,Xuandong Zhao,Lingjiao Chen,Haotian Ye,Sheng Liu,Zhi Huang,Daniel A. McFarland,James Y. Zou

from arxiv, 46 pages, 31 figures, ICML '24

We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in AI conferences that took place after the release of ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these conferences could have been substantially modified by LLMs, i.e. beyond spell-checking or minor writing updates. The circumstances in which generated text occurs offer insight into user behavior: the estimated fraction of LLM-generated text is higher in reviews which report lower confidence, were submitted close to the deadline, and from reviewers who are less likely to respond to author rebuttals. We also observe corpus-level trends in generated text which may be too subtle to detect at the individual level, and discuss the implications of such trends on peer review. We call for future interdisciplinary work to examine how LLM use is changing our information and knowledge practices.

翻译：我们提出了一种方法，用于估计大规模文本语料库中可能被大型语言模型（LLM）实质性修改或生成文本的比例。我们的最大似然模型利用专家撰写和AI生成的参考文本，在语料库层面准确高效地检测现实世界中的LLM使用情况。我们将此方法应用于ChatGPT发布后AI会议科学同行评审的案例研究，包括ICLR 2024、NeurIPS 2023、CoRL 2023和EMNLP 2023。研究结果表明，提交至这些会议的同行评审文本中，约有6.5%至16.9%可能被LLM实质性修改（即超越拼写检查或轻微写作润色）。生成文本出现的场景揭示了用户行为规律：在自报置信度较低、临近截止日期提交、以及评审人较少回应作者反驳意见的评审中，LLM生成文本的估计比例更高。我们还观察到生成文本在语料库层面呈现的趋势——这些趋势在个体层面可能过于细微而难以察觉，并讨论了此类趋势对同行评审体系的影响。我们呼吁未来开展跨学科研究，以深入探究LLM使用如何改变我们的信息处理与知识实践模式。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

GPT文本如何检测？《检测AI生成文本：影响当前方法检测能力的因素》最新综述

专知会员服务

24+阅读 · 2024年7月3日

大语言模型如何改变现代战争：ChatGPT 是否适用于军事领域？

专知会员服务

80+阅读 · 2024年5月31日

GPT-4科学发现如何？微软230页长文《大型语言模型对科学发现的影响:使用GPT-4的初步研究》，涵盖5大科学领域，前景可期

专知会员服务

70+阅读 · 2023年11月15日

《利用 ChatGPT 实现高效事实核查》

专知会员服务

48+阅读 · 2023年10月25日