A Unified Contrastive Transfer Framework with Propagation Structure for Boosting Low-Resource Rumor Detection

The truth is significantly hampered by massive rumors that spread along with breaking news or popular topics. Since there is sufficient corpus gathered from the same domain for model training, existing rumor detection algorithms show promising performance on yesterday's news. However, due to a lack of substantial training data and prior expert knowledge, they are poor at spotting rumors concerning unforeseen events, especially those propagated in different languages (i.e., low-resource regimes). In this paper, we propose a unified contrastive transfer framework to detect rumors by adapting the features learned from well-resourced rumor data to that of the low-resourced with only few-shot annotations. More specifically, we first represent rumor circulated on social media as an undirected topology for enhancing the interaction of user opinions, and then train a Multi-scale Graph Convolutional Network via a unified contrastive paradigm to mine effective clues simultaneously from post semantics and propagation structure. Our model explicitly breaks the barriers of the domain and/or language issues, via language alignment and a novel domain-adaptive contrastive learning mechanism. To well-generalize the representation learning using a small set of annotated target events, we reveal that rumor-indicative signal is closely correlated with the uniformity of the distribution of these events. We design a target-wise contrastive training mechanism with three event-level data augmentation strategies, capable of unifying the representations by distinguishing target events. Extensive experiments conducted on four low-resource datasets collected from real-world microblog platforms demonstrate that our framework achieves much better performance than state-of-the-art methods and exhibits a superior capacity for detecting rumors at early stages.

翻译：真相受到伴随突发新闻或热门话题传播的大量谣言的严重干扰。由于同一领域内积累了充足的语料用于模型训练，现有谣言检测算法在处理昨日新闻时表现良好。然而，由于缺乏足够的训练数据和先验专家知识，这些算法难以识别关于突发事件（尤其是以不同语言传播的突发事件，即低资源场景）的谣言。本文提出了一种统一的对比迁移框架，通过将资源充足谣言数据中习得的特征适配到仅含少量标注样本的低资源场景中实现谣言检测。具体而言，我们首先将社交媒体上传播的谣言表示为无向拓扑结构以增强用户观点交互，随后通过统一对比范式训练多尺度图卷积网络，从帖子语义和传播结构中同时挖掘有效线索。我们的模型通过语言对齐与新型领域自适应对比学习机制，明确突破了领域和/或语言障碍。为利用少量标注目标事件实现表征学习的良好泛化，我们揭示出谣言指示信号与这些事件分布的均匀性密切相关。我们设计了基于目标事件的对比训练机制，并提出了三种事件级数据增强策略，能够通过区分目标事件实现表征的统一。在四个来自真实微博平台的低资源数据集上的大量实验表明，我们的框架取得了远优于现有最优方法的性能，并展现出在谣言早期阶段进行检测的卓越能力。