Correcting misinformation on social media with a large language model

Real-world information, often multimodal, can be misinformed or potentially misleading due to factual errors, outdated claims, missing context, misinterpretation, and more. Such "misinformation" is understudied, challenging to address, and harms many social domains -- particularly on social media, where it can spread rapidly. Manual correction that identifies and explains its (in)accuracies is widely accepted but difficult to scale. While large language models (LLMs) can generate human-like language that could accelerate misinformation correction, they struggle with outdated information, hallucinations, and limited multimodal capabilities. We propose MUSE, an LLM augmented with vision-language modeling and web retrieval over relevant, credible sources to generate responses that determine whether and which part(s) of the given content can be misinformed or potentially misleading, and to explain why with grounded references. We further define a comprehensive set of rubrics to measure response quality, ranging from the accuracy of identifications and factuality of explanations to the relevance and credibility of references. Results show that MUSE consistently produces high-quality outputs across diverse social media content (e.g., modalities, domains, political leanings), including content that has not previously been fact-checked online. Overall, MUSE outperforms GPT-4 by 37% and even high-quality responses from social media users by 29%. Our work provides a general methodological and evaluative framework for correcting misinformation at scale.

翻译：现实世界中的信息通常具有多模态特性，由于事实性错误、过时论断、语境缺失、误解等原因，可能包含错误信息或具有误导性。这类"错误信息"研究不足、处理困难，且危害众多社会领域——尤其在社交媒体上可能快速传播。虽然通过人工核查来识别并解释其（不）准确性被广泛认可，但难以规模化实施。尽管大型语言模型（LLMs）能够生成类人语言以加速错误信息纠正，但其存在信息过时、幻觉问题及多模态能力有限等缺陷。我们提出MUSE模型，该模型通过视觉语言建模与基于相关可信来源的网络检索进行增强，能够生成响应以判定给定内容是否及哪些部分可能存在错误或误导性，并基于可靠引用说明原因。我们进一步定义了一套完整的评估标准来衡量响应质量，涵盖识别的准确性、解释的事实性，以及引用的相关性与可信度。实验结果表明，MUSE在多样化的社交媒体内容（如不同模态、领域、政治倾向）上均能持续生成高质量输出，包括尚未经过在线事实核查的内容。总体而言，MUSE的性能较GPT-4提升37%，甚至比社交媒体用户的高质量回复高出29%。本研究为规模化纠正错误信息提供了通用的方法论与评估框架。