In this paper, we propose a framework to detect topics in social media based on Human Word Association. Identifying topics discussed in these media has become a critical and significant challenge. Most of the work done in this area is in English, but much has been done in the Persian language, especially microblogs written in Persian. Also, the existing works focused more on exploring frequent patterns or semantic relationships and ignored the structural methods of language. In this paper, a topic detection framework using HWA, a method for Human Word Association, is proposed. This method uses the concept of imitation of mental ability for word association. This method also calculates the Associative Gravity Force that shows how words are related. Using this parameter, a graph can be generated. The topics can be extracted by embedding this graph and using clustering methods. This approach has been applied to a Persian language dataset collected from Telegram. Several experimental studies have been performed to evaluate the proposed framework's performance. Experimental results show that this approach works better than other topic detection methods.
翻译:本文提出了一种基于人类词汇联想(HWA)的社交媒体主题检测框架。识别这些媒体中讨论的主题已成为一项关键且重大的挑战。该领域的大部分研究以英语为主,但针对波斯语(尤其是波斯语微博客)的工作也取得了不少进展。此外,现有研究更多侧重于探索频繁模式或语义关系,而忽略了语言的结构性方法。本文提出了一种利用HWA(人类词汇联想方法)的主题检测框架。该方法借鉴了词汇联想中模仿思维能力的理念,并计算展示词语关联性的联想引力场(Associative Gravity Force)。利用该参数可构建图结构,通过图嵌入与聚类方法即可提取主题。该框架已在从Telegram收集的波斯语数据集上进行验证,并通过多项实验评估其性能。实验结果表明,该方法优于其他主题检测技术。