Inferring locations from user texts on social media platforms is a non-trivial and challenging problem relating to public safety. We propose a novel non-uniform grid-based approach for location inference from Twitter messages using Quadtree spatial partitions. The proposed algorithm uses natural language processing (NLP) for semantic understanding and incorporates Cosine similarity and Jaccard similarity measures for feature vector extraction and dimensionality reduction. We chose Twitter as our experimental social media platform due to its popularity and effectiveness for the dissemination of news and stories about recent events happening around the world. Our approach is the first of its kind to make location inference from tweets using Quadtree spatial partitions and NLP, in hybrid word-vector representations. The proposed algorithm achieved significant classification accuracy and outperformed state-of-the-art grid-based content-only location inference methods by up to 24% in correctly predicting tweet locations within a 161km radius and by 300km in median error distance on benchmark datasets.
翻译:从社交媒体平台用户文本中推断地理位置是一项关乎公共安全的重要且具有挑战性的问题。本文提出一种基于四叉树空间划分的非均匀网格方法,用于从Twitter消息中进行位置推断。该算法采用自然语言处理(NLP)进行语义理解,并融合余弦相似度和Jaccard相似度度量进行特征向量提取与降维。我们选择Twitter作为实验性社交媒体平台,因其在全球新闻与实时事件传播中的广泛普及性和有效性。本方法是首个融合四叉树空间划分、NLP及混合词向量表征进行推文位置推断的技术。该算法在基准数据集上实现了显著的分类精度,在161公里半径内正确预测推文位置方面相较最先进的基于网格的纯文本位置推断方法提升了高达24%,中位误差距离减少300公里。