False information can spread quickly on social media, negatively influencing the citizens' behaviors and responses to social events. To better detect all of the fake news, especially long texts which are harder to find completely, a Long-Text Chinese Rumor detection dataset named LTCR is proposed. The LTCR dataset provides a valuable resource for accurately detecting misinformation, especially in the context of complex fake news related to COVID-19. The dataset consists of 1,729 and 500 pieces of real and fake news, respectively. The average lengths of real and fake news are approximately 230 and 152 characters. We also propose \method, Salience-aware Fake News Detection Model, which achieves the highest accuracy (95.85%), fake news recall (90.91%) and F-score (90.60%) on the dataset. (https://github.com/Enderfga/DoubleCheck)
翻译:虚假信息可能在社交媒体上迅速传播,对公民的行为及对社会事件的反应造成负面影响。为更有效地检测所有虚假新闻,特别是难以完整发现的长文本,本文提出了一种名为LTCR的长文本中文谣言检测数据集。该数据集为准确检测虚假信息(尤其是与COVID-19相关的复杂虚假新闻)提供了宝贵资源。数据集包含1,729条真实新闻和500条虚假新闻,真实新闻与虚假新闻的平均长度分别约为230个字符和152个字符。此外,我们提出了\method(即显著性感知虚假新闻检测模型),该模型在该数据集上取得了最高准确率(95.85%)、虚假新闻召回率(90.91%)和F值(90.60%)。(数据集链接:https://github.com/Enderfga/DoubleCheck)