The truth is significantly hampered by massive rumors that spread along with breaking news or popular topics. Since there is sufficient corpus gathered from the same domain for model training, existing rumor detection algorithms show promising performance on yesterday's news. However, due to a lack of training data and prior expert knowledge, they are poor at spotting rumors concerning unforeseen events, especially those propagated in different languages (i.e., low-resource regimes). In this paper, we propose a unified contrastive transfer framework to detect rumors by adapting the features learned from well-resourced rumor data to that of the low-resourced. More specifically, we first represent rumor circulated on social media as an undirected topology, and then train a Multi-scale Graph Convolutional Network via a unified contrastive paradigm. Our model explicitly breaks the barriers of the domain and/or language issues, via language alignment and a novel domain-adaptive contrastive learning mechanism. To enhance the representation learning from a small set of target events, we reveal that rumor-indicative signal is closely correlated with the uniformity of the distribution of these events. We design a target-wise contrastive training mechanism with three data augmentation strategies, capable of unifying the representations by distinguishing target events. Extensive experiments conducted on four low-resource datasets collected from real-world microblog platforms demonstrate that our framework achieves much better performance than state-of-the-art methods and exhibits a superior capacity for detecting rumors at early stages.
翻译:真相往往被伴随突发新闻或热门话题传播的海量谣言严重遮蔽。由于现有谣言检测算法能从同一领域收集充足语料进行模型训练,因此其在处理昨日新闻时表现优异。然而,由于缺乏训练数据和专家先验知识,这些算法在识别突发事件相关谣言(特别是以不同语言传播的低资源场景)时表现欠佳。本文提出一种统一的对比迁移框架,通过将高资源谣言数据中习得的特征适配至低资源场景来实现谣言检测。具体而言,我们首先将社交媒体上传播的谣言建模为无向拓扑结构,随后通过统一对比范式训练多尺度图卷积网络。该模型通过语言对齐机制与新型域自适应对比学习机制,显式打破领域和/或语言的壁垒。为增强对少量目标事件集合的表征学习,我们揭示了谣言指示信号与这些事件分布均匀性之间的密切关联。我们设计了一种结合三种数据增强策略的目标感知对比训练机制,通过区分目标事件实现表征统一。在从真实微博平台收集的四个低资源数据集上进行的大量实验表明,本框架的性能显著优于现有最先进方法,并展现出强大的早期谣言检测能力。