Considerable advancements have been made to tackle the misrepresentation of information derived from reference articles in the domains of fact-checking and faithful summarization. However, an unaddressed aspect remains - the identification of social media posts that manipulate information within associated news articles. This task presents a significant challenge, primarily due to the prevalence of personal opinions in such posts. We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information. To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles. Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance. Additionally, we have developed a simple yet effective basic model that outperforms LLMs significantly on the ManiTweet dataset. Finally, we have conducted an exploratory analysis of human-written tweets, unveiling intriguing connections between manipulation and the domain and factuality of news articles, as well as revealing that manipulated sentences are more likely to encapsulate the main story or consequences of a news outlet.
翻译:在事实核查和忠实摘要领域,针对源自参考文章的信息误传已取得显著进展。然而,一个尚未解决的方面仍然存在——识别那些操纵相关新闻文章信息的社交媒体帖子。这项任务提出了重大挑战,主要源于此类帖子中普遍存在的个人观点。我们提出了一项新颖的任务:识别社交媒体上的新闻操纵,旨在检测社交媒体帖子中的操纵行为并识别被篡改或插入的信息。为研究此任务,我们提出了一种数据收集方案,并构建了一个名为 ManiTweet 的数据集,包含 3.6K 对推文及对应文章。我们的分析表明,该任务极具挑战性,大型语言模型(LLMs)的表现不尽如人意。此外,我们开发了一个简单而有效的基础模型,其在 ManiTweet 数据集上的表现显著优于 LLMs。最后,我们对人工撰写的推文进行了探索性分析,揭示了操纵行为与新闻文章的领域及事实性之间的有趣关联,并发现被篡改的句子更可能包含新闻媒体的主要事件或后果。