Increased sophistication of large language models (LLMs) and the consequent quality of generated multilingual text raises concerns about potential disinformation misuse. While humans struggle to distinguish LLM-generated content from human-written texts, the scholarly debate about their impact remains divided. Some argue that heightened fears are overblown due to natural ecosystem limitations, while others contend that specific "longtail" contexts face overlooked risks. Our study bridges this debate by providing the first empirical evidence of LLM presence in the latest real-world disinformation datasets, documenting the increase of machine-generated content following ChatGPT's release, and revealing crucial patterns across languages, platforms, and time periods.
翻译:大型语言模型(LLM)日益精进及其生成的多语言文本质量的提升,引发了对潜在虚假信息滥用的担忧。尽管人类难以区分LLM生成内容与人工撰写文本,学界对其影响的讨论仍存分歧。部分观点认为,由于自然生态系统的限制,过度担忧实属夸大;另一些则主张特定“长尾”情境面临被忽视的风险。本研究通过提供首个关于现实世界最新虚假信息数据集中LLM存在的实证证据,记录了ChatGPT发布后机器生成内容的增长,并揭示了跨语言、平台和时间维度的关键模式,从而弥合了上述争论。