Discovering emerging entities (EEs) is the problem of finding entities before their establishment. These entities can be critical for individuals, companies, and governments. Many of these entities can be discovered on social media platforms, e.g. Twitter. These identities have been the spot of research in academia and industry in recent years. Similar to any machine learning problem, data availability is one of the major challenges in this problem. This paper proposes EEPT. That is an online clustering method able to discover EEs without any need for training on a dataset. Additionally, due to the lack of a proper evaluation metric, this paper uses a new metric to evaluate the results. The results show that EEPT is promising and finds significant entities before their establishment.
翻译:新兴实体发现是指在实体确立之前就识别出这些实体的任务。这些实体可能对个人、企业和政府至关重要。许多此类实体可在社交媒体平台(如推特)上被发现。近年来,这些身份已成为学术界和工业界的研究热点。与任何机器学习问题类似,数据可用性是这一领域面临的主要挑战之一。本文提出了EEPT方法,这是一种无需数据集训练即可在线聚类发现新兴实体的方法。此外,由于缺乏合适的评估指标,本文采用了一种新指标来评估结果。实验表明,EEPT方法具有良好前景,能够在实体确立前发现重要实体。