The advent of Large Language Models (LLMs) has made a transformative impact. However, the potential that LLMs such as ChatGPT can be exploited to generate misinformation has posed a serious concern to online safety and public trust. A fundamental research question is: will LLM-generated misinformation cause more harm than human-written misinformation? We propose to tackle this question from the perspective of detection difficulty. We first build a taxonomy of LLM-generated misinformation. Then we categorize and validate the potential real-world methods for generating misinformation with LLMs. Then, through extensive empirical investigation, we discover that LLM-generated misinformation can be harder to detect for humans and detectors compared to human-written misinformation with the same semantics, which suggests it can have more deceptive styles and potentially cause more harm. We also discuss the implications of our discovery on combating misinformation in the age of LLMs and the countermeasures.
翻译:大型语言模型(LLM)的出现带来了变革性影响。然而,诸如ChatGPT等LLM可能被利用来生成虚假信息的潜在风险,已对网络安全和公众信任构成严重关切。核心研究问题在于:LLM生成的虚假信息是否会比人类撰写的虚假信息造成更大危害?我们拟从检测难度的角度切入这一问题。首先构建了LLM生成虚假信息的分类体系,而后分类验证了利用LLM生成虚假信息的潜在现实方法。通过大量实证研究发现:相较于语义相同的人类撰写虚假信息,LLM生成的虚假信息更难被人类和检测器识别,这表明其可能具有更具欺骗性的文本风格,并可能造成更大危害。我们还讨论了该发现对LLM时代打击虚假信息的启示及应对策略。