Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media

Social media platforms are experiencing a growing presence of AI-Generated Texts (AIGTs). However, the misuse of AIGTs could have profound implications for public opinion, such as spreading misinformation and manipulating narratives. Despite its importance, it remains unclear how prevalent AIGTs are on social media. To address this gap, this paper aims to quantify and monitor the AIGTs on online social media platforms. We first collect a dataset (SM-D) with around 2.4M posts from 3 major social media platforms: Medium, Quora, and Reddit. Then, we construct a diverse dataset (AIGTBench) to train and evaluate AIGT detectors. AIGTBench combines popular open-source datasets and our AIGT datasets generated from social media texts by 12 LLMs, serving as a benchmark for evaluating mainstream detectors. With this setup, we identify the best-performing detector (OSM-Det). We then apply OSM-Det to SM-D to track AIGTs across social media platforms from January 2022 to October 2024, using the AI Attribution Rate (AAR) as the metric. Specifically, Medium and Quora exhibit marked increases in AAR, rising from 1.77% to 37.03% and 2.06% to 38.95%, respectively. In contrast, Reddit shows slower growth, with AAR increasing from 1.31% to 2.45% over the same period. Our further analysis indicates that AIGTs on social media differ from human-written texts across several dimensions, including linguistic patterns, topic distributions, engagement levels, and the follower distribution of authors. We envision our analysis and findings on AIGTs in social media can shed light on future research in this domain.

翻译：社交媒体平台正经历着AI生成文本日益增长的存在。然而，AIGT的滥用可能对舆论产生深远影响，例如传播错误信息和操纵叙事。尽管其重要性不言而喻，但AIGT在社交媒体上的普遍程度仍不明确。为填补这一空白，本文旨在量化和监测在线社交媒体平台上的AIGT。我们首先从Medium、Quora和Reddit这三大社交媒体平台收集了一个包含约240万条帖子的数据集。随后，我们构建了一个多样化数据集，用于训练和评估AIGT检测器。AIGTBench结合了流行的开源数据集以及我们通过12个大型语言模型基于社交媒体文本生成的AIGT数据集，可作为评估主流检测器的基准。基于此设置，我们确定了性能最佳的检测器。接着，我们应用OSM-Det对SM-D进行分析，以AI归因率作为指标，追踪从2022年1月到2024年10月期间社交媒体平台上的AIGT。具体而言，Medium和Quora的AAR呈现显著增长，分别从1.77%上升至37.03%以及从2.06%上升至38.95%。相比之下，Reddit的增长较为缓慢，同期AAR从1.31%增至2.45%。我们的进一步分析表明，社交媒体上的AIGT在多个维度上与人类撰写的文本存在差异，包括语言模式、主题分布、参与度水平以及作者的关注者分布。我们期望对社交媒体中AIGT的分析和发现能为该领域的未来研究提供启示。