New events emerge over time influencing the topics of rumors in social media. Current rumor detection benchmarks use random splits as training, development and test sets which typically results in topical overlaps. Consequently, models trained on random splits may not perform well on rumor classification on previously unseen topics due to the temporal concept drift. In this paper, we provide a re-evaluation of classification models on four popular rumor detection benchmarks considering chronological instead of random splits. Our experimental results show that the use of random splits can significantly overestimate predictive performance across all datasets and models. Therefore, we suggest that rumor detection models should always be evaluated using chronological splits for minimizing topical overlaps.
翻译:随着时间的推移,新事件不断出现,影响着社交媒体中谣言的主题。当前的谣言检测基准采用随机划分的方式作为训练集、开发集和测试集,这通常会导致主题重叠。因此,由于时间概念漂移,基于随机划分训练的模型可能无法在未见过的主题上对谣言分类表现良好。本文中,我们考虑采用时间顺序而非随机划分的方式,对四个流行的谣言检测基准上的分类模型进行了重新评估。实验结果表明,使用随机划分会显著高估所有数据集和模型的预测性能。因此,我们建议谣言检测模型应始终使用时间顺序划分进行评估,以最小化主题重叠。