Social Event Detection (SED) aims to identify significant events from social streams, and has a wide application ranging from public opinion analysis to risk management. In recent years, Graph Neural Network (GNN) based solutions have achieved state-of-the-art performance. However, GNN-based methods often struggle with noisy and missing edges between messages, affecting the quality of learned message embedding. Moreover, these methods statically initialize node embedding before training, which, in turn, limits the ability to learn from message texts and relations simultaneously. In this paper, we approach social event detection from a new perspective based on Pre-trained Language Models (PLMs), and present RPLM_SED (Relational prompt-based Pre-trained Language Models for Social Event Detection). We first propose a new pairwise message modeling strategy to construct social messages into message pairs with multi-relational sequences. Secondly, a new multi-relational prompt-based pairwise message learning mechanism is proposed to learn more comprehensive message representation from message pairs with multi-relational prompts using PLMs. Thirdly, we design a new clustering constraint to optimize the encoding process by enhancing intra-cluster compactness and inter-cluster dispersion, making the message representation more distinguishable. We evaluate the RPLM_SED on three real-world datasets, demonstrating that the RPLM_SED model achieves state-of-the-art performance in offline, online, low-resource, and long-tail distribution scenarios for social event detection tasks.
翻译:社交事件检测(SED)旨在从社交数据流中识别重大事件,在舆情分析到风险管理等领域具有广泛应用。近年来,基于图神经网络(GNN)的方法取得了最先进的性能。然而,GNN方法在处理信息之间的噪声和缺失边时存在困难,影响了学习到的信息嵌入质量。此外,这些方法在训练前静态初始化节点嵌入,这限制了同时从信息文本和关系中进行学习的能力。本文从基于预训练语言模型(PLMs)的新视角出发,提出RPLM_SED(基于关系提示的预训练语言模型用于社交事件检测)。我们首先提出一种新的成对信息建模策略,将社交信息构建为具有多关系序列的信息对。其次,提出一种新的基于多关系提示的成对信息学习机制,利用PLMs通过多关系提示从信息对中学习更全面的信息表示。第三,我们设计了一种新的聚类约束,通过增强簇内紧凑性和簇间分散性来优化编码过程,使信息表示更具区分性。我们在三个真实世界数据集上评估了RPLM_SED,证明该模型在社交事件检测任务的离线、在线、低资源及长尾分布场景中均达到了最先进性能。