A crucial aspect of a rumor detection model is its ability to generalize, particularly its ability to detect emerging, previously unknown rumors. Past research has indicated that content-based (i.e., using solely source posts as input) rumor detection models tend to perform less effectively on unseen rumors. At the same time, the potential of context-based models remains largely untapped. The main contribution of this paper is in the in-depth evaluation of the performance gap between content and context-based models specifically on detecting new, unseen rumors. Our empirical findings demonstrate that context-based models are still overly dependent on the information derived from the rumors' source post and tend to overlook the significant role that contextual information can play. We also study the effect of data split strategies on classifier performance. Based on our experimental results, the paper also offers practical suggestions on how to minimize the effects of temporal concept drift in static datasets during the training of rumor detection methods.
翻译:谣言检测模型的关键能力在于其泛化性,特别是检测新兴未知谣言的能力。已有研究表明,基于内容(即仅以源帖作为输入)的谣言检测模型在面对未知谣言时表现欠佳。与此同时,基于上下文的模型的潜力仍未被充分挖掘。本文的主要贡献在于深入评估基于内容与基于上下文的模型在检测新型未知谣言时的性能差异。实证结果表明,基于上下文的模型仍然过度依赖从谣言源帖中提取的信息,而忽视了上下文信息可能发挥的重要作用。我们还研究了数据划分策略对分类器性能的影响。基于实验结果,本文就如何在训练谣言检测方法时最小化静态数据集中时间概念漂移的影响提出了实用建议。