Social platforms have emerged as crucial platforms for disseminating information and discussing real-life social events, which offers an excellent opportunity for researchers to design and implement novel event detection frameworks. However, most existing approaches merely exploit keyword burstiness or network structures to detect unspecified events. Thus, they often fail to identify unspecified events regarding the challenging nature of events and social data. Social data, e.g., tweets, is characterized by misspellings, incompleteness, word sense ambiguation, and irregular language, as well as variation in aspects of opinions. Moreover, extracting discriminative features and patterns for evolving events by exploiting the limited structural knowledge is almost infeasible. To address these challenges, in this thesis, we propose a novel framework, namely EnrichEvent, that leverages the lexical and contextual representations of streaming social data. In particular, we leverage contextual knowledge, as well as lexical knowledge, to detect semantically related tweets and enhance the effectiveness of the event detection approaches. Eventually, our proposed framework produces cluster chains for each event to show the evolving variation of the event through time. We conducted extensive experiments to evaluate our framework, validating its high performance and effectiveness in detecting and distinguishing unspecified social events.
翻译:社交平台已成为传播信息和讨论现实社交事件的重要平台,这为研究人员设计和实现新颖的事件检测框架提供了极佳机会。然而,现有方法大多仅利用关键词突发性或网络结构来检测未指定事件,因此常因事件与社交数据本身的挑战性特征而无法识别未指定事件。社交数据(如推文)具有拼写错误、不完整性、词义歧义、非规范语言以及观点维度差异等特点。此外,利用有限的结构化知识为演化事件提取判别性特征与模式几乎不可行。为应对这些挑战,本文提出一种名为EnrichEvent的新型框架,该框架利用流式社交数据的词汇表征与上下文表征。具体而言,我们融合上下文知识与词汇知识来检测语义相关的推文,从而提升事件检测方法的效果。最终,所提框架为每个事件生成聚类链,以展示事件随时间的演化过程。我们通过大量实验评估该框架,验证了其在检测与区分未指定社交事件方面的高性能与有效性。