MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

The advent of large language models (LLMs) has revolutionized online content creation, making it much easier to generate high-quality fake news. This misuse threatens the integrity of our digital environment and ethical standards. Therefore, understanding the motivations and mechanisms behind LLM-generated fake news is crucial. In this study, we analyze the creation of fake news from a social psychology perspective and develop a comprehensive LLM-based theoretical framework, LLM-Fake Theory. We introduce a novel pipeline that automates the generation of fake news using LLMs, thereby eliminating the need for manual annotation. Utilizing this pipeline, we create a theoretically informed Machine-generated Fake news dataset, MegaFake, derived from the GossipCop dataset. We conduct comprehensive analyses to evaluate our MegaFake dataset. We believe that our dataset and insights will provide valuable contributions to future research focused on the detection and governance of fake news in the era of LLMs.

翻译：大语言模型（LLMs）的出现彻底改变了在线内容创作，使得生成高质量的假新闻变得容易得多。这种滥用行为威胁着我们数字环境的完整性和道德标准。因此，理解LLM生成假新闻背后的动机和机制至关重要。在本研究中，我们从社会心理学视角分析假新闻的创作，并开发了一个全面的基于LLM的理论框架——LLM-Fake理论。我们引入了一种新颖的流水线，利用LLM自动生成假新闻，从而消除了手动标注的需要。利用该流水线，我们从GossipCop数据集中创建了一个基于理论的机器生成假新闻数据集——MegaFake。我们进行了全面的分析以评估我们的MegaFake数据集。我们相信，我们的数据集和见解将为未来专注于LLM时代假新闻检测与治理的研究提供宝贵的贡献。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日