The rapid dissemination of misinformation through online social networks poses a pressing issue with harmful consequences jeopardizing human health, public safety, democracy, and the economy; therefore, urgent action is required to address this problem. In this study, we construct a new human-annotated dataset, called MiDe22, having 5,284 English and 5,064 Turkish tweets with their misinformation labels for several recent events between 2020 and 2022, including the Russia-Ukraine war, COVID-19 pandemic, and Refugees. The dataset includes user engagements with the tweets in terms of likes, replies, retweets, and quotes. We also provide a detailed data analysis with descriptive statistics and the experimental results of a benchmark evaluation for misinformation detection.
翻译:在线社交网络中虚假信息的快速传播已成为一个紧迫问题,其有害后果危及人类健康、公共安全、民主与经济,因此亟需采取行动应对这一挑战。本研究构建了一个新的人工标注数据集MiDe22,包含5,284条英文推文和5,064条土耳其语推文,涵盖2020年至2022年间包括俄乌战争、COVID-19疫情及难民问题在内的多个近期事件,每条推文均标注有虚假信息标签。数据集同时收录了用户对推文的互动数据,包括点赞、回复、转发和引用。我们还提供了详细的数据分析(含描述性统计)以及虚假信息检测基准评估的实验结果。