With the widespread use of social networks, detecting the topics discussed in these networks has become a significant challenge. The current works are mainly based on frequent pattern mining or semantic relations, and the language structure is not considered. The meaning of language structural methods is to discover the relationship between words and how humans understand them. Therefore, this paper uses the Concept of the Imitation of the Mental Ability of Word Association to propose a topic detection framework in social networks. This framework is based on the Human Word Association method. The performance of this method is evaluated on the FA-CUP dataset. It is a benchmark dataset in the field of topic detection. The results show that the proposed method is a good improvement compared to other methods, based on the Topic-recall and the keyword F1 measure. Also, most of the previous works in the field of topic detection are limited to the English language, and the Persian language, especially microblogs written in this language, is considered a low-resource language. Therefore, a data set of Telegram posts in the Farsi language has been collected. Applying the proposed method to this dataset also shows that this method works better than other topic detection methods.
翻译:随着社交网络的广泛应用,检测其中讨论的话题已成为一项重要挑战。现有研究主要基于频繁模式挖掘或语义关系,但并未考虑语言结构。语言结构方法的意义在于发现词语之间的关系以及人类理解这些关系的方式。因此,本文借鉴"词汇关联心理能力模仿"概念,提出一种社交网络话题检测框架。该框架基于人类词汇关联方法,并在话题检测领域基准数据集FA-CUP上评估其性能。结果表明,相较于其他方法,本文方法在话题召回率和关键词F1值方面均有显著提升。此外,大多数现有话题检测研究仅限于英语,而波斯语(尤其是使用该语言的微博内容)被视为低资源语言。为此,本文收集了波斯语Telegram帖子数据集。将该方法应用于此数据集的结果同样表明,本文方法优于其他话题检测方法。