Geo-entity linking is the task of linking a location mention to the real-world geographic location. In this paper we explore the challenging task of geo-entity linking for noisy, multilingual social media data. There are few open-source multilingual geo-entity linking tools available and existing ones are often rule-based, which break easily in social media settings, or LLM-based, which are too expensive for large-scale datasets. We present a method which represents real-world locations as averaged embeddings from labeled user-input location names and allows for selective prediction via an interpretable confidence score. We show that our approach improves geo-entity linking on a global and multilingual social media dataset, and discuss progress and problems with evaluating at different geographic granularities.
翻译:地理实体链接是将地点提及与真实世界地理位置关联的任务。本文探索了针对嘈杂、多语言社交媒体数据的地理实体链接这一挑战性任务。目前可用的开源多语言地理实体链接工具较少,现有工具通常基于规则(在社交媒体场景下易失效)或基于大语言模型(对大规模数据集成本过高)。我们提出了一种方法,通过将标注的用户输入地点名称的平均嵌入表示真实世界位置,并利用可解释置信度分数实现选择性预测。实验表明,该方法在全局多语言社交媒体数据集上提升了地理实体链接效果,同时我们讨论了不同地理粒度评估中的进展与问题。