As large language models (LLMs) like ChatGPT have gained traction, an increasing number of news websites have begun utilizing them to generate articles. However, not only can these language models produce factually inaccurate articles on reputable websites but disreputable news sites can utilize LLMs to mass produce misinformation. To begin to understand this phenomenon, we present one of the first large-scale studies of the prevalence of synthetic articles within online news media. To do this, we train a DeBERTa-based synthetic news detector and classify over 15.90 million articles from 3,074~misinformation and mainstream news websites. We find that between January 1, 2022, and May 1, 2023, the relative number of synthetic news articles increased by 61.1% on mainstream websites while increasing by 426% on misinformation sites. We find that this increase is largely driven by smaller less popular websites. Analyzing the impact of the release of ChatGPT using an interrupted-time-series, we show that while its release resulted in a marked increase in synthetic articles on small sites as well as misinformation news websites, there was not a corresponding increase on large mainstream news websites.
翻译:随着ChatGPT等大型语言模型(LLMs)的普及,越来越多的新闻网站开始利用它们生成文章。然而,这些语言模型不仅可能在知名网站上产生事实不准确的报道,而且不可靠的新闻网站还可能利用LLMs大规模制造虚假信息。为初步了解这一现象,我们提出了针对在线新闻媒体中合成文章盛行程度的首批大规模研究之一。为此,我们训练了一个基于DeBERTa的合成新闻检测器,并对来自3,074个虚假信息和主流新闻网站的超过1,590万篇文章进行了分类。研究发现,从2022年1月1日至2023年5月1日,主流网站上合成新闻文章的相对数量增加了61.1%,而虚假信息网站上的增幅则高达426%。我们还发现,这一增长主要由规模较小、知名度较低的网站推动。通过使用间断时间序列分析ChatGPT发布的影响,我们表明:尽管其发布导致小型网站和虚假信息新闻网站上的合成文章显著增加,但大型主流新闻网站并未出现相应的增长。