The recent success in language generation capabilities of large language models (LLMs), such as GPT, Bard, Llama etc., can potentially lead to concerns about their possible misuse in inducing mass agitation and communal hatred via generating fake news and spreading misinformation. Traditional means of developing a misinformation ground-truth dataset does not scale well because of the extensive manual effort required to annotate the data. In this paper, we propose an LLM-based approach of creating silver-standard ground-truth datasets for identifying misinformation. Specifically speaking, given a trusted news article, our proposed approach involves prompting LLMs to automatically generate a summarised version of the original article. The prompts in our proposed approach act as a controlling mechanism to generate specific types of factual incorrectness in the generated summaries, e.g., incorrect quantities, false attributions etc. To investigate the usefulness of this dataset, we conduct a set of experiments where we train a range of supervised models for the task of misinformation detection.
翻译:大型语言模型(如GPT、Bard、Llama等)在语言生成能力方面的近期成功,可能引发对其潜在滥用的担忧,例如通过生成假新闻和传播虚假信息来煽动群体骚动和社群仇恨。传统构建虚假信息真实标注数据集的方法因需要大量人工标注而难以扩展。本文提出一种基于大型语言模型的方法,用于创建识别虚假信息的银标准真实标注数据集。具体而言,给定一篇可信新闻文章,我们的方法通过提示大型语言模型自动生成原始文章的摘要版本。所提出的提示机制作为控制手段,可在生成的摘要中引入特定类型的事实错误,例如不正确的数量、虚假归因等。为探究该数据集的实用性,我们开展了一系列实验,训练多个监督模型用于虚假信息检测任务。