In this paper, we propose a framework to detect topics in social media based on Human Word Association. Identifying topics discussed in these media has become a critical and significant challenge. Most of the work done in this area is in English, but much has been done in the Persian language, especially microblogs written in Persian. Also, the existing works focused more on exploring frequent patterns or semantic relationships and ignored the structural methods of language. In this paper, a topic detection framework using HWA, a method for Human Word Association, is proposed. This method uses the concept of imitation of mental ability for word association. This method also calculates the Associative Gravity Force that shows how words are related. Using this parameter, a graph can be generated. The topics can be extracted by embedding this graph and using clustering methods. This approach has been applied to a Persian language dataset collected from Telegram. Several experimental studies have been performed to evaluate the proposed framework's performance. Experimental results show that this approach works better than other topic detection methods.
翻译:本文提出一个基于人类词汇联想(Human Word Association)的社交媒体主题检测框架。识别这些媒体中讨论的话题已成为一项关键而重大的挑战。该领域的大部分工作以英语为对象,但在波斯语,特别是波斯语微信息方面,相关研究仍较为有限。此外,现有工作更多侧重于探索高频模式或语义关系,忽视了语言的结构性方法。本文提出一种基于HWA(人类词汇联想)的主题检测框架。该方法借鉴模仿人脑联想能力的理念,通过计算反映词汇关联程度的联想引力(Associative Gravity Force)参数构建图结构,进而利用图嵌入与聚类方法提取主题。该框架已在从Telegram采集的波斯语数据集上应用。通过多项实验评估其性能,结果表明该方法优于其他主题检测技术。