The outbreak of COVID-19 has led to a global surge of Sinophobia partly because of the spread of misinformation, disinformation, and fake news on China. In this paper, we report on the creation of a novel classifier that detects whether Chinese-language social media posts from Twitter are related to fake news about China. The classifier achieves an F1 score of 0.64 and an accuracy rate of 93%. We provide the final model and a new training dataset with 18,425 tweets for researchers to study fake news in the Chinese language during the COVID-19 pandemic. We also introduce a new dataset generated by our classifier that tracks the dynamics of fake news in the Chinese language during the early pandemic.
翻译:COVID-19的爆发引发了全球性的仇华情绪,部分原因是关于中国的虚假信息、误导性信息和假新闻的传播。本文报告了一种新型分类器的构建,该分类器能够检测推特上中文社交媒体帖子是否与关于中国的假新闻相关。该分类器实现了0.64的F1分数和93%的准确率。我们提供了最终模型以及包含18,425条推文的新训练数据集,供研究人员研究COVID-19疫情期间的中文假新闻。此外,我们还引入了一个由分类器生成的新数据集,该数据集追踪了疫情早期中文假新闻的动态变化。