While the world has been combating COVID-19 for over three years, an ongoing "Infodemic" due to the spread of fake news regarding the pandemic has also been a global issue. The existence of the fake news impact different aspect of our daily lives, including politics, public health, economic activities, etc. Readers could mistake fake news for real news, and consequently have less access to authentic information. This phenomenon will likely cause confusion of citizens and conflicts in society. Currently, there are major challenges in fake news research. It is challenging to accurately identify fake news data in social media posts. In-time human identification is infeasible as the amount of the fake news data is overwhelming. Besides, topics discussed in fake news are hard to identify due to their similarity to real news. The goal of this paper is to identify fake news on social media to help stop the spread. We present Deep Learning approaches and an ensemble approach for fake news detection. Our detection models achieved higher accuracy than previous studies. The ensemble approach further improved the detection performance. We discovered feature differences between fake news and real news items. When we added them into the sentence embeddings, we found that they affected the model performance. We applied a hybrid method and built models for recognizing topics from posts. We found half of the identified topics were overlapping in fake news and real news, which could increase confusion in the population.
翻译:尽管全球已与COVID-19抗争超过三年,但由疫情相关虚假新闻传播引发的持续"信息疫情"始终是一个全球性问题。虚假新闻的存在影响着我们日常生活的方方面面,包括政治、公共卫生、经济活动等。读者可能将虚假新闻误认为真实新闻,从而减少对真实信息的获取。这一现象很可能引发民众困惑和社会冲突。当前,虚假新闻研究面临重大挑战:准确识别社交媒体帖子中的虚假新闻数据颇具难度,由于虚假新闻数据量庞大,实时人工识别并不可行;此外,虚假新闻讨论的话题因与真实新闻高度相似而难以区分。本文旨在识别社交媒体上的虚假新闻以助力阻断其传播。我们提出了基于深度学习方法及集成方法的虚假新闻检测方案。我们的检测模型较以往研究实现了更高准确率,集成方法进一步提升了检测性能。我们发现虚假新闻与真实新闻之间存在特征差异,当将这些特征纳入句子嵌入时,观察到其对模型性能产生影响。我们采用混合方法构建了从帖子中识别主题的模型,发现约半数识别出的主题在虚假新闻与真实新闻中重叠,这可能加剧民众的认知混乱。