Considerable advancements have been made to tackle the misrepresentation of information derived from reference articles in the domains of fact-checking and faithful summarization. However, an unaddressed aspect remains - the identification of social media posts that manipulate information within associated news articles. This task presents a significant challenge, primarily due to the prevalence of personal opinions in such posts. We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information. To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles. Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance. Additionally, we have developed a simple yet effective basic model that outperforms LLMs significantly on the ManiTweet dataset. Finally, we have conducted an exploratory analysis of human-written tweets, unveiling intriguing connections between manipulation and the domain and factuality of news articles, as well as revealing that manipulated sentences are more likely to encapsulate the main story or consequences of a news outlet.
翻译:在事实核查与忠实摘要领域,针对参考文章信息失实问题的研究已取得显著进展。然而,一个尚未被充分探索的方面仍然存在——识别社交媒体帖子中对关联新闻文章信息的操纵行为。这一任务极具挑战性,主要源于此类帖子中普遍存在的个人观点。我们提出一项新任务——社交媒体新闻操纵识别,旨在检测社交媒体帖子中的操纵行为并识别被操纵或插入的信息。为研究该任务,我们设计了一套数据采集方案,并构建了名为ManiTweet的数据集,包含3,600组推文与对应文章的配对样本。分析表明,该任务极具挑战性,大型语言模型(LLMs)的表现不尽如人意。此外,我们开发了一个简单而有效的基础模型,其在ManiTweet数据集上的表现显著优于LLMs。最后,我们对人工撰写的推文进行了探索性分析,揭示了操纵行为与新闻领域及其真实性之间的有趣关联,并发现被操纵的句子更可能概括新闻事件的主要情节或后果。