Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media

Social media platforms are experiencing a growing presence of AI-Generated Texts (AIGTs). However, the misuse of AIGTs could have profound implications for public opinion, such as spreading misinformation and manipulating narratives. Despite its importance, a systematic study to assess the prevalence of AIGTs on social media is still lacking. To address this gap, this paper aims to quantify, monitor, and analyze the AIGTs on online social media platforms. We first collect a dataset (SM-D) with around 2.4M posts from 3 major social media platforms: Medium, Quora, and Reddit. Then, we construct a diverse dataset (AIGTBench) to train and evaluate AIGT detectors. AIGTBench combines popular open-source datasets and our AIGT datasets generated from social media texts by 12 LLMs, serving as a benchmark for evaluating mainstream detectors. With this setup, we identify the best-performing detector (OSM-Det). We then apply OSM-Det to SM-D to track AIGTs over time and observe different trends of AI Attribution Rate (AAR) across social media platforms from January 2022 to October 2024. Specifically, Medium and Quora exhibit marked increases in AAR, rising from 1.77% to 37.03% and 2.06% to 38.95%, respectively. In contrast, Reddit shows slower growth, with AAR increasing from 1.31% to 2.45% over the same period. Our further analysis indicates that AIGTs differ from human-written texts across several dimensions, including linguistic patterns, topic distributions, engagement levels, and the follower distribution of authors. We envision our analysis and findings on AIGTs in social media can shed light on future research in this domain.

翻译：社交媒体平台正经历着AI生成文本（AIGT）日益增长的存在。然而，AIGT的滥用可能对舆论产生深远影响，例如传播错误信息和操纵叙事。尽管这一问题至关重要，但目前仍缺乏系统性研究来评估社交媒体上AIGT的普遍程度。为填补这一空白，本文旨在量化、监测并分析在线社交媒体平台上的AIGT。我们首先从Medium、Quora和Reddit三大社交媒体平台收集了包含约240万条帖子的数据集（SM-D）。随后，我们构建了一个多样化数据集（AIGTBench）用于训练和评估AIGT检测器。AIGTBench结合了流行的开源数据集以及我们通过12个大型语言模型基于社交媒体文本生成的AIGT数据集，可作为评估主流检测器的基准。基于此框架，我们确定了性能最优的检测器（OSM-Det）。接着，我们将OSM-Det应用于SM-D以追踪AIGT随时间的变化趋势，并观察到2022年1月至2024年10月期间不同社交媒体平台的AI归因率呈现差异化趋势：Medium和Quora的AAR显著上升，分别从1.77%增至37.03%和2.06%增至38.95%；而Reddit增长较缓，同期AAR仅从1.31%上升至2.45%。进一步分析表明，AIGT在语言模式、主题分布、参与度及作者粉丝分布等多个维度上与人类撰写的文本存在差异。我们期望本研究对社交媒体中AIGT的分析与发现能为该领域的未来研究提供启示。