Countering Misinformation via Emotional Response Generation

The proliferation of misinformation on social media platforms (SMPs) poses a significant danger to public health, social cohesion and ultimately democracy. Previous research has shown how social correction can be an effective way to curb misinformation, by engaging directly in a constructive dialogue with users who spread -- often in good faith -- misleading messages. Although professional fact-checkers are crucial to debunking viral claims, they usually do not engage in conversations on social media. Thereby, significant effort has been made to automate the use of fact-checker material in social correction; however, no previous work has tried to integrate it with the style and pragmatics that are commonly employed in social media communication. To fill this gap, we present VerMouth, the first large-scale dataset comprising roughly 12 thousand claim-response pairs (linked to debunking articles), accounting for both SMP-style and basic emotions, two factors which have a significant role in misinformation credibility and spreading. To collect this dataset we used a technique based on an author-reviewer pipeline, which efficiently combines LLMs and human annotators to obtain high-quality data. We also provide comprehensive experiments showing how models trained on our proposed dataset have significant improvements in terms of output quality and generalization capabilities.

翻译：社交媒体平台上错误信息的泛滥对公共卫生、社会凝聚力乃至民主制度构成了重大威胁。先前的研究表明，通过直接与传播（通常是善意传播）误导性信息的用户进行建设性对话，社会纠正是遏制错误信息的有效方式。尽管专业事实核查员对辟谣传播性言论至关重要，但他们通常不参与社交媒体上的对话。因此，人们投入了大量精力来自动化使用事实核查材料进行社会纠正；然而，以往的研究并未尝试将其与社交媒体交流中常用的风格和语用相结合。为填补这一空白，我们提出了VerMouth——首个包含约1.2万条言论-回应配对（关联辟谣文章）的大规模数据集，这些数据涵盖了社交媒体风格和基本情感两个因素，它们在错误信息的可信度和传播中扮演着重要角色。为收集此数据集，我们采用了一种基于作者-审稿人流水线的技术，该技术高效结合了大语言模型和人工标注员，从而获得高质量数据。我们还提供了全面实验，证明在我们提出的数据集上训练的模型在输出质量和泛化能力方面取得了显著提升。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日