In today's digital era, the rapid spread of misinformation poses threats to public well-being and societal trust. As online misinformation proliferates, manual verification by fact checkers becomes increasingly challenging. We introduce FACT-GPT (Fact-checking Augmentation with Claim matching Task-oriented Generative Pre-trained Transformer), a framework designed to automate the claim matching phase of fact-checking using Large Language Models (LLMs). This framework identifies new social media content that either supports or contradicts claims previously debunked by fact-checkers. Our approach employs GPT-4 to generate a labeled dataset consisting of simulated social media posts. This data set serves as a training ground for fine-tuning more specialized LLMs. We evaluated FACT-GPT on an extensive dataset of social media content related to public health. The results indicate that our fine-tuned LLMs rival the performance of larger pre-trained LLMs in claim matching tasks, aligning closely with human annotations. This study achieves three key milestones: it provides an automated framework for enhanced fact-checking; demonstrates the potential of LLMs to complement human expertise; offers public resources, including datasets and models, to further research and applications in the fact-checking domain.
翻译:在当今数字时代,虚假信息的快速传播对公共福祉和社会信任构成威胁。随着在线虚假信息的泛滥,事实核查人员的人工验证日益困难。我们提出了FACT-GPT(基于声明匹配任务导向型生成式预训练Transformer的事实核查增强框架),该框架旨在利用大语言模型实现事实核查中声明匹配阶段的自动化。该框架能识别出支持或反驳事实核查人员先前已辟谣声明的新社交媒体内容。我们的方法使用GPT-4生成包含模拟社交媒体帖子的标注数据集,该数据集作为微调更专业化大语言模型的训练基础。我们在与公共健康相关的大量社交媒体内容数据集上评估了FACT-GPT。结果表明,经过微调的大语言模型在声明匹配任务上可与更大的预训练模型相媲美,且与人工标注高度一致。本研究实现了三个关键里程碑:提供了用于增强事实核查的自动化框架;展现了大语言模型补充人类专业知识的潜力;公开了数据集和模型等资源,以推动事实核查领域的进一步研究与应用。