Detecting events from social media data streams is gradually attracting researchers. The innate challenge for detecting events is to extract discriminative information from social media data thereby assigning the data into different events. Due to the excessive diversity and high updating frequency of social data, using supervised approaches to detect events from social messages is hardly achieved. To this end, recent works explore learning discriminative information from social messages by leveraging graph contrastive learning (GCL) and embedding clustering in an unsupervised manner. However, two intrinsic issues exist in benchmark methods: conventional GCL can only roughly explore partial attributes, thereby insufficiently learning the discriminative information of social messages; for benchmark methods, the learned embeddings are clustered in the latent space by taking advantage of certain specific prior knowledge, which conflicts with the principle of unsupervised learning paradigm. In this paper, we propose a novel unsupervised social media event detection method via hybrid graph contrastive learning and reinforced incremental clustering (HCRC), which uses hybrid graph contrastive learning to comprehensively learn semantic and structural discriminative information from social messages and reinforced incremental clustering to perform efficient clustering in a solidly unsupervised manner. We conduct comprehensive experiments to evaluate HCRC on the Twitter and Maven datasets. The experimental results demonstrate that our approach yields consistent significant performance boosts. In traditional incremental setting, semi-supervised incremental setting and solidly unsupervised setting, the model performance has achieved maximum improvements of 53%, 45%, and 37%, respectively.
翻译:从社交媒体数据流中检测事件正逐渐引起研究者的关注。事件检测的内在挑战在于从社交媒体数据中提取判别性信息,从而将数据分配到不同事件中。由于社交数据的过度多样性和高更新频率,使用监督方法从社交消息中检测事件难以实现。为此,近期研究通过利用图对比学习(GCL)和嵌入聚类以无监督方式从社交消息中学习判别性信息。然而,基准方法中存在两个固有问题:传统GCL仅能粗略探索部分属性,从而无法充分学习社交消息的判别性信息;基准方法在潜在空间中对学习到的嵌入进行聚类时,需借助特定先验知识,这与无监督学习范式的原则相冲突。本文提出一种基于混合图对比学习与强化增量聚类(HCRC)的新型无监督社交媒体事件检测方法,通过混合图对比学习全面学习社交消息中的语义和结构判别性信息,并通过强化增量聚类以严格无监督方式实现高效聚类。我们在Twitter和Maven数据集上进行了全面的实验评估。实验结果表明,我们的方法实现了持续且显著的性能提升。在传统增量设置、半监督增量设置和严格无监督设置下,模型性能分别最大提升了53%、45%和37%。