The advent of Large Language Models (LLMs) has made a transformative impact. However, the potential that LLMs such as ChatGPT can be exploited to generate misinformation has posed a serious concern to online safety and public trust. A fundamental research question is: will LLM-generated misinformation cause more harm than human-written misinformation? We propose to tackle this question from the perspective of detection difficulty. We first build a taxonomy of LLM-generated misinformation. Then we categorize and validate the potential real-world methods for generating misinformation with LLMs. Then, through extensive empirical investigation, we discover that LLM-generated misinformation can be harder to detect for humans and detectors compared to human-written misinformation with the same semantics, which suggests it can have more deceptive styles and potentially cause more harm. We also discuss the implications of our discovery on combating misinformation in the age of LLMs and the countermeasures.
翻译:大型语言模型(LLMs)的出现带来了变革性影响。然而,ChatGPT等LLMs可能被利用来生成虚假信息,这已对在线安全和公众信任构成严重关切。一个基础研究问题是:LLM生成的虚假信息是否比人类编写的虚假信息造成更大危害?我们提出从检测难度的角度探讨该问题。首先构建LLM生成虚假信息的分类体系,继而分类并验证利用LLMs生成虚假信息的潜在现实方法。通过大量实证研究,我们发现:与语义相同的人类编写虚假信息相比,LLM生成的虚假信息对人类和检测器更具隐蔽性,这表明其可能具有更强的欺骗性风格并潜在地造成更大危害。我们还讨论了该发现对LLM时代虚假信息治理及对策的启示。