As Large Language Models become more proficient, their misuse in coordinated disinformation campaigns is a growing concern. This study explores the capability of ChatGPT with GPT-3.5 to generate short-form disinformation claims about the war in Ukraine, both in general and on a specific event, which is beyond the GPT-3.5 knowledge cutoff. Unlike prior work, we do not provide the model with human-written disinformation narratives by including them in the prompt. Thus the generated short claims are hallucinations based on prior world knowledge and inference from the minimal prompt. With a straightforward prompting technique, we are able to bypass model safeguards and generate numerous short claims. We compare those against human-authored false claims on the war in Ukraine from ClaimReview, specifically with respect to differences in their linguistic properties. We also evaluate whether AI authorship can be differentiated by human readers or state-of-the-art authorship detection tools. Thus, we demonstrate that ChatGPT can produce realistic, target-specific disinformation claims, even on a specific post-cutoff event, and that they cannot be reliably distinguished by humans or existing automated tools.
翻译:随着大型语言模型能力日益增强,其在协同虚假信息活动中的滥用问题日益引发关注。本研究探讨了基于GPT-3.5的ChatGPT生成关于乌克兰战争的短篇虚假信息主张的能力,包括一般性主张和特定事件主张(该事件超出GPT-3.5的知识截止日期)。与先前研究不同,我们未在提示中包含人工撰写的虚假信息叙事,因此生成的简短主张完全基于模型先验世界知识和对最小化提示的推理所产生的幻觉。通过简单的提示技术,我们成功绕过了模型防护机制,生成了大量简短主张。我们将这些主张与ClaimReview中关于乌克兰战争的人工撰写虚假主张进行对比,特别关注其语言特性的差异。同时评估了人类读者或最先进的作者身份检测工具能否区分AI生成内容。研究表明,ChatGPT能够生成逼真且具有特定针对性的虚假信息主张(即使是针对知识截止日期后的特定事件),且这些主张无法被人类或现有自动化工具可靠识别。